Dissecting XPS, part 6 - reading XPS files programatically

The sixth part of the Dissecting XPS series is here and this time we will, finally, look at some code for reading XML Paper Specification [1], XPS, files.

I will in the following sample not use the Microsoft.NET 3.0 Framework, which has built-in functionality for reading and writing XPS files [2]. Instead I will do it using .NET 2.0 (you can try it in .NET 1.1 if you like) and an excellent ZIP library called #ziplib [3]. This will show you more of what’s really happening and it will show you how to integrate XPS into applications built using other .NET Frameworks than 3.0 or in Mono or in what ever you like. For instance, you can use Java and the Java Zip packages.

Getting the parts

First of all we look how to retrieve the parts from an XPS document. Retrieving a part from an XPS document is as easy as opening the XPS file using the ZIP library and reading out a file. We start with parsing the Start Part relationships from the XPS package.

   1:  using (ZipFile zipFile = new ZipFile(File.OpenRead(fileName))) {

   2:    ZipEntry startPartEntry = zipFile.GetEntry("_rels/.rels");

   3:    XmlDocument startPartXml = new XmlDocument();

   4:    startPartXml.Load(zipFile.GetInputStream(startPartEntry));

   5:  }

.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, “Courier New”, courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; overflow-x:visible;border:0px} .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Now the startPartXml object contains the start part relationships for this Open Packaging Convention [4] document, using this file we can find the FixedDocument part, Core Properties part or others using the System.Xml namespace.

Getting the Core properties

If we would like to extract the title of the document we just have to parse the Start Part to find the target for the Core Properties Part file and then parse it for the title:

   1:  NameTable nt = new NameTable();

   2:  XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);

   3:  nsmgr.AddNamespace("rel", "http://schemas.openxmlformats.org/package/2006/relationships");

   4:  nsmgr.AddNamespace("cp", "http://schemas.openxmlformats.org/package/2006/metadata/core-properties");

   5:  nsmgr.AddNamespace("dc", "http://purl.org/dc/elements/1.1/");

6:

   7:  XmlElement coreElm = startPartXml.SelectSingleNode(

   8:    "/rel:Relationships/rel:Relationship[@Type='http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties']",

   9:     nsmgr) as XmlElement;

  10:  string partNamePath = coreElm.GetAttribute("Target");

11:

  12:  ZipEntry partNameEntry = zipFile.GetEntry(partNamePath.TrimStart(new char[] { '/' }));

13:

  14:  XmlDocument coreXml = new XmlDocument();

  15:  coreXml.Load(zipFile.GetInputStream(partNameEntry));

16:

  17:  XmlNode title = coreXml.SelectSingleNode("/cp:coreProperties/dc:title", nsmgr);

  18:  if (title != null) {

  19:      Console.WriteLine("Title of document is " + title.InnerText);

  20:  }

  21:  else {

  22:      Console.WriteLine("Title not found");

  23:  }

Pretty easy! First of all create an System.Xml.XmlNamespaceManager object which holds the different namespaces that we need to parse the parts (line 1-5). Then select the correct relationship in the Start Part for the Core Properties part (line 7-9). Read the core properties XML (line 10-15) and finally read the title property (line 17).

Getting the content

Getting the rest of the document works in the same way; reading the relationship for the FixedDocument sequence and then retrieve the fixed documents.

Now you have seen how easy you can start producing your own XPS parsers and readers without using the .NET 3.0 Framwork classes [2] which we will look at next time.

Dissecting XPS, part 6 - reading XPS files programatically

Wictor Wilen

Getting the parts

Getting the Core properties

Getting the content

Custom code with SharePoint Online and Windows Azure

Creating your own MetaWeblog based blog engine

Web Parts Connections - Introduction

NVIDIA has released new beta drivers for Windows Vista

FrontView for Media Center on Windows Vista