This post is migrated from previous hosting provider. There are still some issues with old posts. Please make a comment on this post with any issues.

Dissecting XPS, part 6 - reading XPS files programatically

Tags: .NET, C#, XML, XPS

The sixth part of the Dissecting XPS series is here and this time we will, finally, look at some code for reading XML Paper Specification [1], XPS, files.

I will in the following sample not use the Microsoft.NET 3.0 Framework, which has built-in functionality for reading and writing XPS files [2]. Instead I will do it using .NET 2.0 (you can try it in .NET 1.1 if you like) and an excellent ZIP library called #ziplib [3]. This will show you more of what's really happening and it will show you how to integrate XPS into applications built using other .NET Frameworks than 3.0 or in Mono or in what ever you like. For instance, you can use Java and the Java Zip packages.

Getting the parts

First of all we look how to retrieve the parts from an XPS document. Retrieving a part from an XPS document is as easy as opening the XPS file using the ZIP library and reading out a file. We start with parsing the Start Part relationships from the XPS package.

   1:  using (ZipFile zipFile = new ZipFile(File.OpenRead(fileName))) {
   2:    ZipEntry startPartEntry = zipFile.GetEntry("_rels/.rels");
   3:    XmlDocument startPartXml = new XmlDocument();
   4:    startPartXml.Load(zipFile.GetInputStream(startPartEntry));
   5:  }

Now the startPartXml object contains the start part relationships for this Open Packaging Convention [4] document, using this file we can find the FixedDocument part, Core Properties part or others using the System.Xml namespace.

Getting the Core properties

If we would like to extract the title of the document we just have to parse the Start Part to find the target for the Core Properties Part file and then parse it for the title:

   1:  NameTable nt = new NameTable();
   2:  XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
   3:  nsmgr.AddNamespace("rel", "http://schemas.openxmlformats.org/package/2006/relationships");
   4:  nsmgr.AddNamespace("cp", "http://schemas.openxmlformats.org/package/2006/metadata/core-properties");
   5:  nsmgr.AddNamespace("dc", "http://purl.org/dc/elements/1.1/");
   6:   
   7:  XmlElement coreElm = startPartXml.SelectSingleNode(
   8:    "/rel:Relationships/rel:Relationship[@Type='http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties']",
   9:     nsmgr) as XmlElement;
  10:  string partNamePath = coreElm.GetAttribute("Target");
  11:   
  12:  ZipEntry partNameEntry = zipFile.GetEntry(partNamePath.TrimStart(new char[] { '/' }));
  13:   
  14:  XmlDocument coreXml = new XmlDocument();
  15:  coreXml.Load(zipFile.GetInputStream(partNameEntry));
  16:   
  17:  XmlNode title = coreXml.SelectSingleNode("/cp:coreProperties/dc:title", nsmgr);
  18:  if (title != null) {
  19:      Console.WriteLine("Title of document is " + title.InnerText);
  20:  }
  21:  else {
  22:      Console.WriteLine("Title not found");
  23:  }

Pretty easy! First of all create an System.Xml.XmlNamespaceManager object which holds the different namespaces that we need to parse the parts (line 1-5). Then select the correct relationship in the Start Part for the Core Properties part (line 7-9). Read the core properties XML (line 10-15)  and finally read the title property (line 17).

Getting the content

Getting the rest of the document works in the same way; reading the relationship for the FixedDocument sequence and then retrieve the fixed documents.

Now you have seen how easy you can start producing your own XPS parsers and readers without using the .NET 3.0 Framwork classes [2] which we will look at next time.

Further reading and references

[1] XPS Specification[2] XPS in .NET 3.0[3] #ziplib[4] Open Packaging Convention

No Comments

Comments have been disabled for this content.

About Wictor...

Wictor Wilén is a Director and SharePoint Architect working at Connecta AB. Wictor has achieved the Microsoft Certified Architect (MCA) - SharePoint 2010, Microsoft Certified Solutions Master (MCSM) - SharePoint  and Microsoft Certified Master (MCM) - SharePoint 2010 certifications. He has also been awarded Microsoft Most Valuable Professional (MVP) for four consecutive years.

And a word from our sponsors...

SharePoint 2010 Web Parts in Action