Contents tagged with XML

  • How to sort XML without XSL in the .NET Framework

    Tags: .NET, C#, XML

    I have several times needed a way to sort XML, retrieved from a file or a web service, inline without invoking XSL transformations, which is the most common way to do it as I have seen.

    The .NET Framework contains the System.Xml.XPath namespace and is available from the .NET Framework 1.1 and up. This namespace contains a number of classes which can improve the performance of your .NET classes when working with XML. Specifically the System.Xml.XPath.XPathExpression contains a method (System.Xml.XPath.XPathExpression.AddSort) to sort the XPath query results.

    The code below shows you an easy way to sort your XML data based on this namespace.

    This is the XML we will use:

      <Entry id="1">
      Entry id="2">
      <Entry id="3">

    To sort this based on the Name element all we have to do is to create an XPathNavigator and add an XPath Expression with sorting like this:

    XmlDocument doc = new XmlDocument();
    XPathNavigator navigator = doc.CreateNavigator();
    XPathExpression expression = navigator.Compile("Entries/Entry");
    expression.AddSort("Name", XmlSortOrder.Ascending, XmlCaseOrder.UpperFirst,     string.Empty, XmlDataType.Text);
    XPathNodeIterator iterator = navigator.Select(expression);
    foreach (XPathNavigator item in iterator) {

    As you can see we create the XPathNavigator from the XmlDocument using the CreateNavigator method (which can be found on XmlNode and derivatives as well) and the create the expression with the sort and executes it using the Select method. Really simple!

    To sort on the id attribute we change the AddSort method call to this:

    expression.AddSort("@id", XmlSortOrder.Ascending, XmlCaseOrder.UpperFirst, string.Empty, XmlDataType.Number);

    Notice the XmlDataType.Number value of the last parameter.

    The sample above will work on the .NET Framework 2.0 and above, since it uses the GetEnumerator() method of the XPathNodeIterator class, which was introduced in 2.0.

    To make it work in .NET 1.1 rewrite the foreach statement with something like this:

    XPathNodeIterator iterator = navigator.Select(expression);
    while(iterator.MoveNext()) {

    Hope this one will help you doing crazy workarounds...

    Update:  changed typo in .NET 1.1. code sample...

  • Smooth upgrade of .NET XSL transformations from 1.1 to 2.0 or higher

    Tags: .NET, C#, XML, WinFX

    When .NET 2.0 was introduced, quite a long time ago, the whole System.Xml namespace was re-written, due to the poor performance of the System.Xml implementation. Despite the fact that the CLR 2.0 has been around for a few years there are still implementations using CLR 1.x and especially the XSL transformation bits, since that part is completely re-written and marked as obsolete.

    But note that they are only being marked as obsolete! You can still compile and run most of the code with just a compiler warning. The old .NET 1.1 classes are still in the CLR 2.0, so you can convert your XSL transformations piece by piece and start using .NET 2.0 or even .NET 3.5 since it is based on .CLR 2.0 (read this post by Scott Hanselman to get a better understanding).

    Just make sure that you test everything really thoroughly before putting it in production, since it is not supported.

    I am currently moving one of our large applications to .NET 3.5, which has been working really smooth using .NET 1.1, but now I want to use Visual Studio 2008 and C# 3.0 in the upcoming versions. We will eventually upgrade the XSL transformation parts to use the XslCompiledTransform class, but until then we have to stick with the .NET 1.1 classes.

    I have stumbled upon a few weird things that stopped working after making the upgrade, not all XSL transformations work out fine. The XSLT not() function stopped working in several XSLT files. This is due to that Microsoft has re-written even these old obsolete classes, not just moved them to a new namespace.

    For example I have this XSL snippet, which works fine under a .NET 1.1 compiled environment, which I use to mark alternating lines in different colors.

    xsl:if test="not(position() mod 2)">...

    In .NET 2.0+ I get an System.InvalidCastException when transforming the XSLT, using the old XSLT transformation classes.

    A quick look using the Reflector tool shows us that they have changed the Not function in the BooleanFunctions class (MS.Internal.Xml.XPath.BooleanFunctions in .NET 2.0 and System.Xml.XPath.BooleanFunctions in .NET 1.1). The .NET 1.1 converts the result of the inner expression using the safe Convert.ToBoolean while the .NET 2.0 implementation uses an explicit cast (bool) to convert the result.

    The result of the mod operation is a double according to the MS.Internal.Xml.XPath.NumericExpr.GetValue method in .NET 2.0 and System.Xml.XPath.NumericExpr.getValue in .NET 1.1. Both of them are implicitly casted to objects before casted to booleans.

    // .NET 2.0 not() implementation
    !((bool) query);
    // .NET 1.1 not() implementation

    The same is for all numeric operations (plus, minus, multiplication, division, modulus and negate) inside the not() XSLT function.

    I guess we can't expect a fix for this, even if it would be welcome. But now you know it!

    The problem above was solved with this expression instead:

    xsl:if test="(position() mod 2) = 0">

    P.S. To get rid of the compiler warnings, insert these pragma directives in your C# code:

    #pragma warning disable 0618 // Disable warning CS0618 from here
    // your code goes here
    #pragma warning restore 0618 

  • Architecture Journal Reader - a great WPF demo

    Tags: .NET, XML, Internet and the Web, WinFX

    Microsoft has released a new Windows Presentation Foundation demo sample that is a reader application for The Architecture Journal. The Architecture Journal is a quarterly online magazine focused on IT-architecture and contains nice articles and gives you some good reading. It is available online and as PDF (why not XPS?).

    The Architecture Journal Reader is a WPF sample, very much like the New York Times Reader that was one of the first killer-apps for WPF, that you can use for offline reading of The Architecture Journal.

    Front page of an issue

    I really like this kind of applications. It has a rich, but simple user interface, with the look and feel of a paper magazine. You have a front page of each issue with small abstracts of the articles, all adjusting to the size of your window. You can easily click on an article and continue reading it.

    An article

    The article also looks like a "real" magazine, instead of writing the text from left to right filling out the whole page it's divided into columns with a nice image layout, which gives you a good reading experience on the screen. You can adjust the text size easily and write your own annotations, either using text or ink. If you don't have time to read the article you can send it to your Reading List for later reading.


    The application is based on RSS feeds with an extension namespace and the articles in News Industry Text Format (NITF).

    Download it here and try it out for yourself. I hope the source code will be released.

    I love to see more of these applications and I think this is what the future will look like for news papers and magazines.

  • Does the size matter?

    Tags: XML, Microsoft Office, Office Open XML

    The OOXML versus ODF discussions are getting more intense and either side lays out the same arguments why the other one are better than the other. Some arguments are valid - but some are definitley not!

    Why do the ODF side always use the "it's to much to read" argument!? Just take a look at Andrew Updegroves open letter to the state of Massachusetts, two of his five points is about this argument!

    The Office Open XML format (ECMA-376) covers about 6.500 pages and the Open Document Format (ISO/IEC 26399) about 700 pages. Yes, 6.000 pages is a lot to read during a fast-track process, but I still don't think it's a legal argument for turning OOXML down.

    Let's take a closer look on the page count. OOXML is divided into five parts which covers it all:

    • Fundamentals - 165 pages
    • Open Packaging Conventions - 125 pages
    • Primer - 466 pages
    • Markup Language Reference - 5.766 pages
    • Markup Compatibility and Extensibility - 34 pages

    As you can see most of the pages are in the markup reference, and it's a rather big document. But if you take a closer look to it and compare it to the markup reference of the ODF document you will see that the OOXML reference is much more rich in it's description of the markup used than the ODF and it contains more samples and illustrations, which of course occupies more space but instead gives the reader better information.

    When you are there looking and comparing the documents, put the documents side-by-side and you will notice that the OOXML uses more whitespace and another line spacing - which also affects the number of printed pages!

    ODF with one 706 page document uses and references the SVG standard for vector based graphics - and that's a document with another 719 pages, then you have the MathML specification with another 665 pages and so on with all the other external references.

    Some other points that is of interest in this size matters argument is that OOXML covers the spreadsheet forumlas more extensive than ODF. This is what Miguel de Icaza wrote in january:

    OOXML devotes 324 pages of the standard to document the formulas and functions.

    The original submission to the ECMA TC45 working group did not have any of this information. Jody Goldberg and Michael Meeks that represented Novell at the TC45 requested the information and it eventually made it into the standards. I consider this a win, and I consider those 324 extra pages a win for everyone (almost half the size of the ODF standard).

    Depending on how you count, ODF has 4 to 10 pages devoted to it. There is no way you could build a spreadsheet software based on this specification.

    If you start counting and taking all aspects into consideration I think that the amount to read is about the same - so there goes the size matters argument down the drain...

    Let's just end this stupid discussion. I think there is room for two standards in this area right now and what is your opinion?

  • Open Xml SDK CTP available now

    Tags: .NET, XML, Microsoft Office

    Microsoft has released a CTP for Microsoft SDK for Open XML Formats, which can be downloaded here.

    The SDK contains a strongly typed library, built on top of System.IO.Packaging namespace, for creating documents based on the Open XML Format. Great, now we don't have to use the System.Xml.XmlWriter to create Office 2007 documents.

    Here is a sample on how to write out the number of characters in a Word 2007 document:

    XmlDocument extendedProperties = new XmlDocument();
    using (WordprocessingDocument wordDocument = WordprocessingDocument.Open("document.docx", false)) {
          ExtendedFilePropertiesPart part = wordDocument.ExtendedFilePropertiesPart;
    XmlNodeList characters = extendedProperties.GetElementsByTagName("Characters");

  • Dissecting XPS, part 6 - reading XPS files programatically

    Tags: .NET, C#, XML, XPS

    The sixth part of the Dissecting XPS series is here and this time we will, finally, look at some code for reading XML Paper Specification [1], XPS, files.

    I will in the following sample not use the Microsoft.NET 3.0 Framework, which has built-in functionality for reading and writing XPS files [2]. Instead I will do it using .NET 2.0 (you can try it in .NET 1.1 if you like) and an excellent ZIP library called #ziplib [3]. This will show you more of what's really happening and it will show you how to integrate XPS into applications built using other .NET Frameworks than 3.0 or in Mono or in what ever you like. For instance, you can use Java and the Java Zip packages.

    Getting the parts

    First of all we look how to retrieve the parts from an XPS document. Retrieving a part from an XPS document is as easy as opening the XPS file using the ZIP library and reading out a file. We start with parsing the Start Part relationships from the XPS package.

       1:  using (ZipFile zipFile = new ZipFile(File.OpenRead(fileName))) {
       2:    ZipEntry startPartEntry = zipFile.GetEntry("_rels/.rels");
       3:    XmlDocument startPartXml = new XmlDocument();
       4:    startPartXml.Load(zipFile.GetInputStream(startPartEntry));
       5:  }

    Now the startPartXml object contains the start part relationships for this Open Packaging Convention [4] document, using this file we can find the FixedDocument part, Core Properties part or others using the System.Xml namespace.

    Getting the Core properties

    If we would like to extract the title of the document we just have to parse the Start Part to find the target for the Core Properties Part file and then parse it for the title:

       1:  NameTable nt = new NameTable();
       2:  XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
       3:  nsmgr.AddNamespace("rel", "");
       4:  nsmgr.AddNamespace("cp", "");
       5:  nsmgr.AddNamespace("dc", "");
       7:  XmlElement coreElm = startPartXml.SelectSingleNode(
       8:    "/rel:Relationships/rel:Relationship[@Type='']",
       9:     nsmgr) as XmlElement;
      10:  string partNamePath = coreElm.GetAttribute("Target");
      12:  ZipEntry partNameEntry = zipFile.GetEntry(partNamePath.TrimStart(new char[] { '/' }));
      14:  XmlDocument coreXml = new XmlDocument();
      15:  coreXml.Load(zipFile.GetInputStream(partNameEntry));
      17:  XmlNode title = coreXml.SelectSingleNode("/cp:coreProperties/dc:title", nsmgr);
      18:  if (title != null) {
      19:      Console.WriteLine("Title of document is " + title.InnerText);
      20:  }
      21:  else {
      22:      Console.WriteLine("Title not found");
      23:  }

    Pretty easy! First of all create an System.Xml.XmlNamespaceManager object which holds the different namespaces that we need to parse the parts (line 1-5). Then select the correct relationship in the Start Part for the Core Properties part (line 7-9). Read the core properties XML (line 10-15)  and finally read the title property (line 17).

    Getting the content

    Getting the rest of the document works in the same way; reading the relationship for the FixedDocument sequence and then retrieve the fixed documents.

    Now you have seen how easy you can start producing your own XPS parsers and readers without using the .NET 3.0 Framwork classes [2] which we will look at next time.

    Further reading and references

    [1] XPS Specification[2] XPS in .NET 3.0[3] #ziplib[4] Open Packaging Convention

  • Dissecting XPS, part 5 - Document properties

    Tags: XML, XPS

    This is the fith part of the Dissecting XPS series and will focus on the Xml Paper Specification, XPS, document properties.

    Core Properties

    The properties used in an XPS document are stored in the Core Properties Part, specified in the Open Packaging Conventions, OPC [1]. The Part is located by reading the [Content_Types].xml file and finding the content type application/vnd.openxmlformats-package.core-properties+xml. A document should have one Core Properties part, so there is no requirement to have one but having serveral indicates an invalid package. But there should be no reason to leave out the part. There are also no requirements on which elements that should be present in the part.

    The Core Properties Part contains information about the title, author, creation time etc which are properties that I think are necessary today, it allows you to locate and find documents faster.

       1:  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
       2:  <cp:coreProperties 
       3:    xmlns:cp=""
       4:    xmlns:dc="" 
       5:    xmlns:dcterms="" 
       6:    xmlns:dcmitype="" 
       7:    xmlns:xsi="">
       8:      <dc:title>Wictor test 1</dc:title>
       9:      <dc:subject>XPS test 1</dc:subject>
      10:      <dc:creator>Wictor Wilen</dc:creator>
      11:      <dcterms:created xsi:type="dcterms:W3CDTF">2007-04-01T00:00:00Z</dcterms:created>
      12:      <dcterms:modified xsi:type="dcterms:W3CDTF">2007-04-02T00:00:00Z</dcterms:modified>
      13:      <cp:contentStatus>Reviewed</cp:contentStatus>
      14:      <cp:category>Test</cp:category>
      15:  </cp:coreProperties>

    The sample above is an example of a Core Properties Path, all available elements and the subset of Dublin Core [2] elmennts can be found in the OPC specification[1].


    Thumbnails are great for all kind of files and XPS/OPC allows you to provide your own thumbnail instead of relying on the consumer to create one for you. Thumbnails can be used either on the whole XPS Package or on individual fixed pages. If used on one individual page then it should be used on all pages, according to the specification[3]. Thumbnails are images in the format of either JPEG or PNG.

    Thumbnails are specified as relationships, in the .rels files, using the type, like below. The Target attribute specifies the actual thumbnail file.

       1:  <?xml version="1.0" encoding="utf-8"?>
       2:  <Relationships xmlns="">
       3:    <Relationship Target="/thumbnail/thumbnail.png" Id="R0"
       4:       Type=""/>
       5:  </Relationships>

    By now we have covered pretty much of the XPS specification and It's time to look at some code on how to consume and produce XPS files, but that's for the next post in the series.

    Further reading and references

    [1] ECMA-376 Part 2, Office Open Xml, Open Packaging Conventions[2] Dublin Core Metadata Initiative - Metadata Terms[3] XPS Specification

  • Dissecting XPS, part 4 - the content markup

    Tags: XML, WinFX, Windows Vista, XPS

    This part in the Dissecting XPS series will take off were we ended part 3, by looking into how the actual content is marked up.

    The content is contained in the FixedPage element and it is marked up by three different elements

    • the Path element which specified a geometry filled with a brush
    • the Glyphs element which represents text
    • the Canvas element which groups elements together

    The Path element

    A triangle using the Path element The Path element is used to specify a geometry shape and optionally fill it using a brush. This XPS markup code creates the triangle shown on the right.

       1:  Path Stroke="#000000" StrokeThickness="10"> 
       2:   Path.Fill>
       3:    LinearGradientBrush MappingMode="Absolute"  StartPoint="0,0"            EndPoint="0,8" SpreadMethod="Reflect"> 
       4:     LinearGradientBrush.GradientStops> 
       5:       GradientStop Color="#333311" Offset="0.0" /> 
       6:       GradientStop Color="#3333FF" Offset="1.0" /> 
       7:      LinearGradientBrush.GradientStops> 
       8:     LinearGradientBrush> 
       9:   Path.Fill>
      10:   Path.Data> 
      11:    PathGeometry> 
      12:     PathFigure StartPoint="50,50" IsClosed="true"> 
      13:      PolyLineSegment Points="250,50 150,250" /> 
      14:     </PathFigure> 
      15:    </PathGeometry> 
      16:   Path.Data> 
      17:  Path> 

    The Path element syntax contains several advanced features and allows you to create advanced vector graphics including lines, curves, arcs, beziers etc. Read more in the XPS Specification [1] or at Feng Yuans[5] blog (here an example of radial gradient brushes). If you are familiar with the XAML elements then this will be a piece of cake [6].

    The Glyph element

    Hello World using Glyph The Glyph element is used to draw text that has the same font and style. You specify the font, size and the location of the Unicode text to write:

       1:  Glyphs Fill="#000000" FontRenderingEmSize="48"  
       2:  OriginX="100" 
       3:  OriginY="100"  
       4:  UnicodeString="Hello World!"  
       5:  FontUri="../../../Resources/Fonts/arial.ttf" />

    The sample above also requires that we have a Font part (the .TTF file), a relationship to the Font part in the FixedPage and that the TTF Content Type is present in the [Content_Types].xml file.

    As the Path element the Glyph element is very powerful and allows you to do really advanced typing in any kind of written language. You may also apply any kind of brush to fill the glyphs.

    The Canvas element

    The Canvas element groups other elements (Paths, Glyphs or other Canvas elements) together to either group the elements into units or to apply properties to each child or ancestor of the unit, for example opacity.

    By now you should be able to create your own XPS file using just Notepad and a ZIP program. Up next in the series is some more information on the XPS document properties.

    Further reading and references

    [1] XPS Specification[2] Dissecting XPS - part 1 - The basics[3] Dissecting XPS - part 2 - Inside the XPS document[4] Dissecting XPS - part 3 - the Fixed Document[5] The blog of Feng Yuan - master of XPS [6] WPF/XAML Graphics

    Side note: Windows Vista uses XPS as it's internal printer spooling.

  • Dissecting XPS, part 2 - inside the XPS document

    Tags: XML, XPS

    XPS This is the second part of the Dissecting XPS series, last post generally described the XML Paper Specification. This post in the series will describe the XPS file format internals. This will give you an overview of how the XPS files are built from ground and up, instead of reading the XPS Specification which covers 453 pages.

    The XPS file

    The XPS file, with the .xps extension, is a ZIP file - called the physical Package, and consists of a number of XML and binary files - called Parts. There are also files describing how the files are organized and connected together - called Relationships


    The parts and relationships is grouped into a Payload, according to the Open Packaging Convention[1], and an XPS document must contain at least one fixed payload, which represents a "static or fixed-layout representation of a paginated content" [2]. Each fixed payload starts with a FixedDocumentSequence, which describes the sequence of fixed documents.


    The FixedDocumentSequence contains references to to the fixed documents in the XPS document. Each XPS Document must contains a specific relationship which identifies the FixedDocumentSequence that is the root of the document, called the XPS Document StartPart, so consumers of the document can find the first document sequence.

    The start part relationships is stored in a .rels file in the /_rels folder and contains a reference to the the part containing the FixedDocumentSequence.

    The code below shows the start part relationship file, and line 5 shows the relationship to the fixed document sequence.

       1:  <>xml version="1.0" encoding="UTF-8" standalone="yes"?>
       2:  Relationships xmlns="">
       3:  Relationship Id="rId3"          Type=""          Target="docProps/core.xml"/>
       4:  Relationship Id="rId2"          Type=""          Target="docProps/thumbnail.jpeg"/>
       5:  Relationship Id="rId1"          Type=""          Target="FixedDocSeq.fdseq"/>
       6:  Relationships>

    The Type attribute on line 5 has the value of, which indicates that this is the StartPart [2: Table I-6]. Note that all targets have a paths that are relative to the parent of the _rels folder.

    In the example above (taken from the XPS Specification) there are also two other relationships; line 3 tells us where to find the properties for the XPS document and line 4 shows the reference to the thumbnail image of the document, in this case a JPEG file. I will get back to these Parts later in the series.

    This is the content of the FixedDocSeq.fdseq file, referenced above on line 5, in this case the XPS document has two fixed documents.

       1:  FixedDocumentSequence xmlns=""> 
       2:     DocumentReference Source="Documents/1/FixedDocument.fdoc" /> 
       3:     DocumentReference Source="Documents/2/FixedDocument.fdoc" /> 
       4:  FixedDocumentSequence> 

    Line 2 and 3 contains document references to the fixed documents, their location is specified using the Source attribute and is relative to the file.

    This image shows how the different parts and relationships are located in the ZIP compressed XPS document.

    Brief overview of an XPS Document

    The start part is found in the /_rels/.rels file, and then the document sequence can be found and then the fixed documents which contains the contents. The relationships file also points out where the thumbnail and document properties are stored.

    There are also an important file called [Content_Types].xml containing the content types for the different files, located in the root. This file is specified in the Open Packaging Convention standard[2].

       1:  <>xml version="1.0" encoding="UTF-8" standalone="yes"?>
       2:  Types xmlns="">
       3:    Default Extension="rels"              ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
       4:    Default Extension="fdseq"              ContentType="application/"/>
       5:    Default Extension="fpage"              ContentType="application/"/>
       6:    Default Extension="jpg"              ContentType="image/jpeg"/>
       7:    Default Extension="fdoc"              ContentType="application/"/>
       8:  Types>

    If you rename the XPS document to .ZIP you can easily look into the structure of the XPS file and find the parts and relationships.

    The image above shows the root of an XPS document.

    The FixedDocument Part

    The FixedDocument Part is the root for all pages in the document (a set of fixed pages) and we will look into this Part and others in the next part of this Dissecting XPS series.

    Further reading and references

    [1] Open Packaging Convention[2] XPS Specification[3] Dissecting XPS, part 1 - The basics

About Wictor...

Wictor Wilén is the Nordic Digital Workplace Lead working at Avanade. Wictor has achieved the Microsoft Certified Architect (MCA) - SharePoint 2010, Microsoft Certified Solutions Master (MCSM) - SharePoint  and Microsoft Certified Master (MCM) - SharePoint 2010 certifications. He has also been awarded Microsoft Most Valuable Professional (MVP) for seven consecutive years.

And a word from our sponsors...