The new Office document format present in the new Microsoft Office 2007 suite is completely new and open and based on XML. This opens up a lot of possibilites for us developers; there is now an easy way to create your own Word or Excel documents from an application. The file format used is called Ecma Office Open Xml format.
Previously it has been very difficult or expensive to provide the customers with nice Word or Excel output from an application. For example when delivering an Excel sheet from a website it has been done by creating a table in html and then sending it to the browser with the correct content type (application/vnd.ms-excel), the same thing for Word. But when it came to embedding images or attachments or creating advanced layouts (headers or footers) it has been troublesome. It can be done; for example I created a web application that had a Word document as output, nicely formatted using the mht format.
Basically zipped Xml files
The file types that are using this format have extensions like;
- .docx - Word 2007
- .xlsx - Excel 2007
- .xps - XPS files
These files are essentially a ZIP file, just rename your file or document to .ZIP, and open it with your favorite ZIP manager.
You will then find a number of files and folders within the compressed file.
Content of the compressed file
If you decompress the Open Office file then you will get a set of folders and files, called a package which is divided into parts.
[Content_Types].xml describes the actual parts of the Microsoft Open Office XML package .
The _rels folder and .rels file contains the Relationships between the root part and other parts in the package. A folder within a package can contain a _rels folder which contains that folders releationships
The docProps contains files that describes the document propertes. It should contain a file called core.xml which contains the core properties of the file; such as subject, author etc. It can also contain an app.xml which describes the properties for the file for the application that created the file.
The word folder, in thic case, contains the actual Word document in a file called document.xml as well as files for fonts, styles and themes. Other filetypes, such as an .XPS file can contain a folder called Documents in which the document information is found.
The word folder or other content parts can contain a folder called media, which will contain images or other binary files included in the document
An example of a relationship in this part is that the media folder contains an image with the filename image1.png which are embedded in the document. The _rels folder contains a file called document.xml.rels which contains this Xml:
Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>
The document.xml file contains an image element (a:Graphic) which contains this Xml element which references the Id of the relationship:
This cannot be more easier, so get your app ready for Office 2007!
But, what if I nor my clients have Office 2007?
Don’t worry. Instead of making it into a Word 2007 or Excel 2007 document, make a XPS document. The XPS reader is free for download. Read more about it here.
If you would like to read some more on this or if you would like to test it here are some starting points:
Introducing the Office (2007) Open Xml File Formats - MSDN
Server-Side Generation of Word 2007 Docs - Ted Pattison
Walkthrough: Word 2007 XML format - MSDN
XMl Paper specification - Microsoft
Sample XPS documents - Microsoft