TiMoch » xml

Testing XPathNavigator

timoch — Mon, 03 Jun 2013 09:11:47 +0000

In my previous post about XPathNavigator, I explained in what circumstances the default implementation of XPathNavigator is troublesome. I went over the design of the class and highlighted how that design helps us re-implement XPathNavigator to address the issue.

Testing XPathNavigator

First things first, before attacking the new implementation proper, we want to make sure our implementation is compatible with the default implementation. To do so, we will write tests that will be run both against the Microsoft implementation as well as our implementation once it exists. Our goal here is really twofold. On the one hand, we want to ensure the existing implementation actually works as documented. On the other hand, we want to check our own implementation against the specification tests.

What should we test ?

XPathNavigator is a complex class. So we want to limit my tests to what actually matters for the new implementation. Otherwise, we may be writing literally hundreds of tests.

It is obviously not necessary to test methods that will not be re-implemented. In the previous post, we identified a subset of methods that we will need to re-implement. All other methods are somehow using this basic subset to implement their functionality. The subset is the list of abstract members:

public abstract string BaseURI { get; }
    public abstract bool IsEmptyElement { get; }
    public abstract string LocalName { get; }
    public abstract string Name { get; }
    public abstract string NamespaceURI { get; }
    public abstract XmlNameTable NameTable { get; }
    public abstract XPathNodeType NodeType { get; }
    public abstract string Prefix { get; }

    public abstract bool MoveTo(XPathNavigator other);
    public abstract bool MoveToFirstAttribute();
    public abstract bool MoveToFirstChild();
    public abstract bool MoveToFirstNamespace(XPathNamespaceScope namespaceScope);
    public abstract bool MoveToId(string id);
    public abstract bool MoveToNext();
    public abstract bool MoveToNextAttribute();
    public abstract bool MoveToNextNamespace(XPathNamespaceScope namespaceScope);
    public abstract bool MoveToParent();
    public abstract bool MoveToPrevious();

As you can see, we have two distinct groups:

The abstract properties expose information about the current node. Our tests will ensure that we get consistent information for all types of node.
The abstract methods are all concerned about moving the navigator to another node. The tests need to check that the move operations result in the navigator pointing to the right node given a known starting position.

How should we test it ?

We will test the properties by setting up a XPathNavigator that points to specific nodes of an xml document. Once setup, we simply check the properties expose consistent values. We will test the Move() operations in a very similar way. We will setup the XPathNavigator instance on a specific node, execute the Move() operation we want to test and then check that the XPathNavigator yields values through its properties that are consistent with the navigator’s new position.

This is actually very similar. The only difference is the Move() operation. The similarity will let us factor our most of the test code into a few utility functions.

private void CanMoveImpl(MoveTestArgs args, Func moveOperation) {
    CheckInconclusive(args);

    // Arrange - get a navigator on requested node
    var nav = CreateNavigatorOnSelected(args.Xml, args.InitialPosition);

    // Act - move thenode
    var success = moveOperation(nav);

    // Assert -- check if success consistent with ShouldSucceed
    Expect(success, args.ShouldSucceed ? (Constraint)True : False, "inconsistent success state");
    // Assert -- check node properties 
    ExpectNodeProperties(nav, args);
}

CanMoveImpl() acts as a parametrized test. It takes 2 arguments:

args: a MoveTestArgs instance. This argument describes the test’s original state and the resulting state we should test against.
moveOperation: A delegate to the Move() operation to test. Passing the operation to test as a parameter let us also write non-Move() tests by simply passing a no-op callback.

NUnit: I am using NUnit to write the unit tests. It is only a matter of preference. You can adapt the tests to work against another testing framework such as Microsoft Unit Testing Framework. I find NUnit to be simple to use, non-obstrusive and very flexible.

CanMoveImpl() is called by actual test methods like the following:

[TestCaseSource("CanMoveToNext_Source")]
public void CanMoveToNext(MoveTestArgs args) {
    CanMoveImpl(args, n => n.MoveToNext());
}

It is a parametrized test. The TestCaseSource attribute tells NUnit which method to call to get the MoveTestArgs instance for each test.

public IEnumerable CanMoveToNext_Source() {
    yield return new MoveTestArgs() {
        Xml = @"",
        InitialPosition = "/", // selects root
        ShouldSucceed = false
    };

    yield return new MoveTestArgs() {
        Xml = @"",
        InitialPosition = "/root/child",
        ShouldSucceed = false
    };

    yield return new MoveTestArgs() {
        Xml = @"",
        InitialPosition = "/root/child",
        NodeType = XPathNodeType.Element,
        LocalName = "child2",
    };

    /* ... */
}

Method CanmoveToNext_Source() returns each test case for a given operation. In the above example, we have the test cases for “when position on document root, MoveToNext() should fail”, “When positioned on element whith no next sibling, MoveToNext() should fail” and “when positioned on an element with a next sibling, MoveToNext() should succeed and point to the specific node”.

Each test case is defined by specifying values for the fields of class CanMoveArgs.

public class MoveTestArgs {
    // Xml document
    public string Xml;
    // XPath to select starting first position
    public string InitialPosition;

    // value to test against - no assertion for a given property when not set
    public string BaseURI;
    public bool? IsEmptyElement;
    public string LocalName;
    public string Name;
    public string NamespaceURI;
    public string NameTable;
    public XPathNodeType? NodeType;
    public string Prefix;
    public string Value;

    // Indicates whether the move should succeed or not
    public bool ShouldSucceed = true;

    // indicates whether the test is inconclusive
    public string Inconclusive;

    // TestCaseSource calls ToString() on each test case argument to create the test case name.
    public override string ToString() {
        if (ShouldSucceed)
            return string.Format("{0} -- {1} -- {2} -- {3}", Xml, InitialPosition, NodeType, LocalName);
        else
            return string.Format("{0} -- {1} -- fails", Xml, InitialPosition);
    }
}

Method ExpectNodeProperties() implements the assertions depending on the configuration of its MoveTestArgs instance:

private void ExpectNodeProperties(XPathNavigator navigator, MoveTestArgs args) {
    if (args.LocalName != null) Expect(navigator.LocalName, EqualTo(args.LocalName), "bad localname");
    if (args.Name != null) Expect(navigator.Name, EqualTo(args.Name), "bad name");
    if (args.Prefix != null) Expect(navigator.Prefix, EqualTo(args.Prefix), "bad prefix");
    if (args.NamespaceURI != null) Expect(navigator.NamespaceURI, EqualTo(args.NamespaceURI), "bad namespace uri");
    if (args.NodeType != null) Expect(navigator.NodeType, EqualTo(args.NodeType), "bad node type");
    if (args.Value != null) Expect(navigator.Value, EqualTo(args.Value), "bad value");
}

Executing our tests

We want our tests to be executed against the Microsoft implementation as well as our own implementation. The most straight-forward way of achieving this is to implement our tests in an abstract test fixture. The abstract fixture has an factory method to create an instance of XPathNavigator to test against. For each implementation, we create a subclass of our fixture and override the factory method.

CreateNavigable returns an IXPathNavigable. In turn IXpathNavigable lets us create a navigator positioned on the document root thanks to its CreateNavigator() method.

[TestFixture]
public abstract class XPathDocumentTests : AssertionHelper {
    protected abstract IXPathNavigable CreateNavigable(string xml);
    /* tests implementations */
}

public class MsXPathDocumentTests : XPathDocumentTests {
    protected override IXPathNavigable CreateNavigable(string xml) {
        TextReader textReader = new StringReader(xml);
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.IgnoreWhitespace = false;
        XmlReader reader = XmlReader.Create(textReader, settings);
        return new XPathDocument(reader, XmlSpace.Preserve);
    }
}

We’ll add the test fixture for our own implementation when we have the skeleton available. In the mean time, this lets us verify our expectations against the actual implementation of XPathNavigator.

The next post on the topic will tackle the new implementation’s design. I’ll make the implementation and test available as a source code download at the end of this series of articles.

XPathDocument and whitespaces

timoch — Fri, 24 May 2013 11:57:32 +0000

Writing code is fun. At least it is for me. But sometimes it gets irritating. You know, you’re busy on something, you write the code, you know it’s right but it doesn’t work… You keep your focus on that one piece of code you just wrote and it keeps on not working. Sometimes, the reason it doesn’t work is obvious but sometimes, you keep reviewing your code, its surrounding, you debug away several variants of your solution and it keeps on not working …

I just had one of those moments…

And then, bang ! The solution jumps at me and it’s so obvious I almost felt shame

I was writing the unit tests in preparation for my next article on creating a XPathNavigator implementation. The code basically boils down to this:

TextReader textReader = new StringReader("    ");
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = false;
XmlReader reader = XmlReader.Create(textReader, settings);
XPathDocument doc = new XPathDocument(reader);
var nav = doc.CreateNavigator().SelectSingleNode("/root/text()");
var success = nav.MoveToParent();

I am testing MoveToParent() from a whitespace node.

"/root/text()"

is expected to give me an XPathNavigator located on the whitespace node inside the element. And nav just keeps on being null. Since I had been busy writing pairs of xml samples and xpath queries to put my test in each situation I needed to test. I immediately assumed my xpath query was not correct. i just kept on tweaking here, there. nav is null still …

After some time, I decided to not put more effort into it and come back to it later, once I can take the necessary step back. I posted a question on stackoverflow.com and worked on something else.

I was busy on something completely different when it struck me. One of those “Haha” moments. XPathDocument has a constructor that takes a XmlSpace enum value. By default, if you don’t specify it, XPathDocument will simply skip all non-significant whitespace node.

// skips ignorable whitespaces
XPathDocument doc = new XPathDocument(reader);
// skips also
XPathDocument doc = new XPathDocument(reader, XmlSpace.Default);
// keeps non-significant whitespaces
XPathDocument doc = new XPathDocument(reader, XmlSpace.Preserve);

That’s it … annoying.

So what’s wrong with XPathDocument ?

timoch — Fri, 24 May 2013 07:20:40 +0000

This post is the first in a series of posts related to

XPathDocument

and

XPathNavigator

. I will highlight the qualities and drawbacks of the standard .Net implementations and go through the design and development of a new implementation that fits better to my needs.

First, what is an XPathDocument ?

XPathDocument

is used when you need to query xml data using XPath. For example, you can get a list of article id and ordered quantity from the following xml file:

using the following code:

class Program {
    static void Main(string[] args) {
        XPathDocument doc = new XPathDocument(@"order.xml");

        XPathNavigator nav = doc.CreateNavigator();
        var articles = nav.Select("//article");
        foreach (XPathNavigator article in articles) {
            Console.WriteLine("{0}:{1}", 
                article.SelectSingleNode("@id").Value, 
                article.SelectSingleNode("quantity").Value);
        }
    }
}

XPath, once you get a hang of it, is very powerful and flexible for accessing Xml data. It allows for complex queries and computation.

Where’s the catch?

This is great but there is a drawback. As per the documentation,

XPathDocument

provides a fast, read-only, in-memory representation of an XML document by using the XPath data model. This does not scale well with file sizes. If your files grow to tens of MB or larger, it will be as much data that will be loaded in memory. I recently built a mapping utility based on XPath. The starting requirements were to handle lots of small files. It turned out that once in the field, clients were using feeding it a small number of large files instead. Loading these large files in memory caused a lot of issues from bad responsiveness due to excessive swapping to plain

OutOfMemoryException

The good news is that we can do something about it.

How does XPath work in .Net?

In the above code snippet, you can see that the only reference to

XPathDocument

is to create it. We then use it only once to create an

XPathNavigator

. The rest of the XPath querying involves only XPathNavigators.

XPathNavigator class

XPathNavigator

is a cursor on an xml data structure. As a cursor, it provides basic operations to move the cursor, to query information about the data it points to and also to clone itself.

XPathNavigator

can also be used to update the underlying data if the implementation supports it.

The data of the node pointed to by a navigator can be accessed using a set of properties. The most commonly used would be LocalName, NamespaceURI, Prefix but most importantly Value and its variants. NodeType is also important. The type of a node determines the allowed move operations supported.

public abstract string BaseURI { get; }
    public virtual bool HasAttributes { get; }
    public virtual bool HasChildren { get; }
    public virtual string InnerXml { get; set; }
    public abstract bool IsEmptyElement { get; }
    public abstract string LocalName { get; }
    public abstract string Name { get; }
    public abstract string NamespaceURI { get; }
    public abstract XmlNameTable NameTable { get; }
    public abstract XPathNodeType NodeType { get; }
    public virtual string OuterXml { get; set; }
    public abstract string Prefix { get; }
    public virtual IXmlSchemaInfo SchemaInfo { get; }
    public override object TypedValue { get; }
    public virtual object UnderlyingObject { get; }
    public override bool ValueAsBoolean { get; }
    public override DateTime ValueAsDateTime { get; }
    public override double ValueAsDouble { get; }
    public override int ValueAsInt { get; }
    public override long ValueAsLong { get; }
    public override Type ValueType { get; }
    public virtual string XmlLang { get; }
    public override XmlSchemaType XmlType { get; }

To move a validator around, the following methods can be used. Notice that all but one of them return a boolean. It indicates whether the move operation succeeded. True means the navigator now points to the new node, false, means it has not moved and still point to the original node. The only method that does not return a boolean is MoveToRoot() because it always succeeds.

An operation may fail for various reasons. For example, MoveToNext() will fail if the current node has no next sibling (eg. the last element of a sequence) or if the current node is an attribute. MoveToChild() will fail if there is no child of the current node that satisfies the conditions.

public abstract bool MoveTo(XPathNavigator other);
    public virtual bool MoveToAttribute(string localName, string namespaceURI);
    public virtual bool MoveToChild(XPathNodeType type);
    public virtual bool MoveToChild(string localName, string namespaceURI);
    public virtual bool MoveToFirst();
    public abstract bool MoveToFirstAttribute();
    public abstract bool MoveToFirstChild();
    public bool MoveToFirstNamespace();
    public abstract bool MoveToFirstNamespace(XPathNamespaceScope namespaceScope);
    public virtual bool MoveToFollowing(XPathNodeType type);
    public virtual bool MoveToFollowing(string localName, string namespaceURI);
    public virtual bool MoveToFollowing(XPathNodeType type, XPathNavigator end);
    public virtual bool MoveToFollowing(string localName, string namespaceURI, XPathNavigator end);
    public abstract bool MoveToId(string id);
    public virtual bool MoveToNamespace(string name);
    public abstract bool MoveToNext();
    public virtual bool MoveToNext(XPathNodeType type);
    public virtual bool MoveToNext(string localName, string namespaceURI);
    public abstract bool MoveToNextAttribute();
    public bool MoveToNextNamespace();
    public abstract bool MoveToNextNamespace(XPathNamespaceScope namespaceScope);
    internal bool MoveToNonDescendant();
    public abstract bool MoveToParent();
    public abstract bool MoveToPrevious();
    public virtual void MoveToRoot();

XPath queries?

That’s all very good but you might ask ‘what about XPath queries?’. XPath queries can be executed using the following functions:

public virtual object Evaluate(string xpath);
    public virtual object Evaluate(XPathExpression expr);
    public virtual object Evaluate(string xpath, IXmlNamespaceResolver resolver);
    public virtual object Evaluate(XPathExpression expr, XPathNodeIterator context);
    public virtual bool Matches(string xpath);
    public virtual bool Matches(XPathExpression expr);
    public virtual XPathNodeIterator Select(string xpath);
    public virtual XPathNodeIterator Select(XPathExpression expr);
    public virtual XPathNodeIterator Select(string xpath, IXmlNamespaceResolver resolver);
    public virtual XPathNodeIterator SelectAncestors(XPathNodeType type, bool matchSelf);
    public virtual XPathNodeIterator SelectAncestors(string name, string namespaceURI, bool matchSelf);
    public virtual XPathNodeIterator SelectChildren(XPathNodeType type);
    public virtual XPathNodeIterator SelectChildren(string name, string namespaceURI);
    public virtual XPathNodeIterator SelectDescendants(XPathNodeType type, bool matchSelf);
    public virtual XPathNodeIterator SelectDescendants(string name, string namespaceURI, bool matchSelf);
    public virtual XPathNavigator SelectSingleNode(string xpath);
    public virtual XPathNavigator SelectSingleNode(XPathExpression expression);
    public virtual XPathNavigator SelectSingleNode(string xpath, IXmlNamespaceResolver resolver);

Evaluate() returns a value dependent on the XPath query. The result can be an integer, a string or a node set etc. Matches() tells you whether the current satisfies conditions expressed as an XPath expression. The Select() functions return a node iterator over their result. That is a set of XPathNavigators each pointing to a node in the XPath expression result set.

So how do we solve our problem?

The key element that will help us solve our scaling issue lies in the implementation of the XPath querying methods (Evaluate, Match and Select). Their implementation is actually expressed in terms of Move() operations and property checks on XPathNavigators.

The following example uses on one hand Select() to find the

nodes of root element , on the other hand, it uses a series of Move() operations to do the same.

private static void Example2Select() {
            XPathDocument doc = new XPathDocument(@"order.xml");

            XPathNavigator nav = doc.CreateNavigator();
            var articles = nav.Select("/order/article");
            foreach (XPathNavigator article in articles) {
                Console.WriteLine("node inner xml : {0}", article.OuterXml);
            }
        }

        private static void Example2Move() {
            XPathDocument doc = new XPathDocument(@"order.xml");

            XPathNavigator nav = doc.CreateNavigator();
            nav.MoveToChild("order", ""); // move to element order
            nav.MoveToChild("article", ""); // move to first element article

            do {
                Console.WriteLine("node inner xml : {0}", nav.OuterXml);
            } while (nav.MoveToNext("article", ""));

        }

All XPath queries can be expressed as a series of Move() and Clone() operations. This is exactly what Select() does behind the scene. This is where the design of the

XPathNavigator

class shines. Select() is implemented exclusively in terms of Move() and Clone() operations. This means that any implementation of

XPathNavigator

that supports these operations can benefit from the XPath query language.

Did you notice earlier that some of the Move() operations are virtual, others abstract ? In the same manner that XPath queries can be expressed as a series of Move() operations, most Move() operations can be expressed as a series of some of the most basic move operations. For example, the default implementation of MoveToRoot() is simply

while (this.MoveToParent()) {}

Properties of

XPathNavigator

also follow the same pattern. Virtual properties have a default implementation that relies on the abstract properties.

This design hepls a lot in our case. We can get rid of the default .Net-provided

XPathNavigator

implementation without changing our usage. We will create a new implementation that will not load all the xml data in memory ; instead, it will cache this information to disk. Of course, since disk IO will occur, our implementation will probably be slower. We will see what we can do about it in a later post.

Below are the limited list of methods and properties that must be implemented in order to support XPath querying.

public abstract string BaseURI { get; }
    public abstract bool IsEmptyElement { get; }
    public abstract string LocalName { get; }
    public abstract string Name { get; }
    public abstract string NamespaceURI { get; }
    public abstract XmlNameTable NameTable { get; }
    public abstract XPathNodeType NodeType { get; }
    public abstract string Prefix { get; }

    public abstract bool MoveTo(XPathNavigator other);
    public abstract bool MoveToFirstAttribute();
    public abstract bool MoveToFirstChild();
    public abstract bool MoveToFirstNamespace(XPathNamespaceScope namespaceScope);
    public abstract bool MoveToId(string id);
    public abstract bool MoveToNext();
    public abstract bool MoveToNextAttribute();
    public abstract bool MoveToNextNamespace(XPathNamespaceScope namespaceScope);
    public abstract bool MoveToParent();
    public abstract bool MoveToPrevious();

As you can see, the minimum interface we need to support is not as big as we could have thought. We still have a lot to do though. We have to design our solution and implement it but more importantly, we need to write tests for it.

Conclusion

In a next post, we will setup a series of unit tests. These tests will be run against both the standard implementation (to ensure we understand the requirements correctly) and our new implementation (to make sure we stick to the requirements).

The design of

XPathNavigator

is quite clever. Basing the implementation of XPath queries on the abstract implementation of primitive Move() and Clone() operations enables implementors to keep their internal representation of the data completely decoupled. An implementation could very well provide an Xml-compatible view on a data structure completely unrelated to Xml. For instance, it is quite simple to expose the information of a tree of POCOs using Reflection. Another example would be to expose other data formats such as JSON to XPath-only consumers.