This post is migrated from previous hosting provider. There are still some issues with old posts. Please make a comment on this post with any issues.

About Microsoft Open Office Xml document format

Tags: Microsoft, XML, Microsoft Office, Software

The new Office document format present in the new Microsoft Office 2007 suite is completely new and open and based on XML. This opens up a lot of possibilites for us developers; there is now an easy way to create your own Word or Excel documents from an application. The file format used is called Ecma Office Open Xml format.

Previously it has been very difficult or expensive to provide the customers with nice Word or Excel output from an application. For example when delivering an Excel sheet from a website it has been done by creating a table in html and then sending it to the browser with the correct content type (application/vnd.ms-excel), the same thing for Word. But when it came to embedding images or attachments or creating advanced layouts (headers or footers) it has been troublesome. It can be done; for example I created a web application that had a Word document as output, nicely formatted using the mht format.

Basically zipped Xml files

The file types that are using this format have extensions like;

  • .docx - Word 2007
  • .xlsx - Excel 2007
  • .xps - XPS files

These files are essentially a ZIP file, just rename your file or document to .ZIP, and open it with your favorite ZIP manager.

You will then find a number of files and folders within the compressed file.

Content of the compressed file

If you decompress the Open Office file then you will get a set of folders and files, called a package which is divided into parts.

[Content_Types].xml describes the actual parts of the Microsoft Open Office XML package .

The _rels folder and .rels file contains the Relationships between the root part and other parts in the package. A folder within a package can contain a _rels folder which contains that folders releationships

The docProps contains files that describes the document propertes. It should contain a file called core.xml which contains the core properties of the file; such as subject, author etc. It can also contain an app.xml which describes the properties for the file for the application that created the file.

The word folder, in thic case, contains the actual Word document in a file called document.xml as well as files for fonts, styles and themes.  Other filetypes, such as an .XPS file can contain a folder called Documents in which the document information is found.

The word folder or other content parts can contain a folder called media, which will contain images or other binary files included in the document

Relationships

An example of a relationship in this part is that the media folder contains an image with the filename image1.png which are embedded in the document. The _rels folder contains a file called document.xml.rels which contains this Xml:

Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/> 

The document.xml file contains an image element (a:Graphic) which contains this Xml element which references the Id of the relationship:

a:blip r:embed="rId4"/>

This cannot be more easier, so get your app ready for Office 2007!

But, what if I nor my clients have Office 2007?

Don't worry. Instead of making it into a Word 2007 or Excel 2007 document, make a XPS document. The XPS reader is free for download. Read more about it here.

More reading

If you would like to read some more on this or if you would like to test it here are some starting points:

Introducing the Office (2007) Open Xml File Formats - MSDN

Server-Side Generation of Word 2007 Docs - Ted Pattison

Walkthrough: Word 2007 XML format - MSDN

XMl Paper specification - Microsoft

Sample XPS documents - Microsoft

No Comments

  • Bytesland said

    Long experience has shown that many international standards tend to try to cover every possible situation and end up being far too big to be supported well. As for Microsoft formats, well, there is no choice but to support their formats if LibreOffice is to be usable by the commercial world. So we have ODF, OOXML and Microsoft as three major "standards", plus a raft of proprietary formats. As someone said: the great thing about standards is that there are so many to choose from.

Comments have been disabled for this content.

About Wictor...

Wictor Wilén is the Nordic Digital Workplace Lead working at Avanade. Wictor has achieved the Microsoft Certified Architect (MCA) - SharePoint 2010, Microsoft Certified Solutions Master (MCSM) - SharePoint  and Microsoft Certified Master (MCM) - SharePoint 2010 certifications. He has also been awarded Microsoft Most Valuable Professional (MVP) for seven consecutive years.

And a word from our sponsors...