开发者

Is there a backwards XML parser for .NET?

In my application, I have a known offset of interest in an XML string, and want to answer questions like "what is my parent element?" without parsing the whole document.

This article mentions a library which appears to be in Objective-C for "backwards" XML parsing. My application doesn't require full XML support, so I'm happy to put up with all the caveats about not being able to parse completely robustly. Is there anything like this for C#/.NET?

Clarification: I'm not asking about parsing solutions or performance tradeoffs in general, I'm interest开发者_开发问答ed in particular situations where I am at some point midway through a text stream and just need to know something about the local structure. Imagine a situation where I don't want to get the top of the document because accesses have very high latency.


It's not possible to do this without making some significant assumptions about the nature of your text. Most notably, you have to assume that it's well-formed XML, and that it contains neither CDATA sections nor namespaces.

If you start at any position in the middle of a stream and back up until you hit what appears to be the start of an element, you have no way of knowing that the text you're looking at actually is the start of an element. It could be CDATA. And you can't tell that it's not CDATA until you've backtracked through the entire stream looking for <![CDATA[ and haven't found it.

Namespaces present a similar problem. If you find a start tag like <Foo, you can't know for certain that Foo is in the default namespace until you've backtracked all the way to the document's root element and ascertained that no ancestor element has a namespace declaration. If you find <x:Foo, you have to backtrack until you find an enclosing element with an xmlns:x declaration.

If you know for sure that the text is well-formed XML, that it doesn't contain CDATA, and that its use of namespaces is limited (i.e. you can tell what namespace an element is in just by looking at its start tag), then some of what you're trying to do is at least possible.

You can back up to the first start tag you encounter, create a StreamReader whose origin is that position, and use that to create an XPathDocument that's set up to handle document fragments. Note, by the way, that you have no assurance that the XPathDocument won't read all the way to the end of the text the first time you use it unless, again, you have knowledge about the nature of the text and you know that the matching end tag is going to be present.

But this won't handle the specific case you mentioned, i.e. finding the parent element. To find the parent element you'd need to find a start tag that isn't preceded (as you move backwards) by a matching end tag. This isn't terribly difficult to do - every < character you find is going to be the beginning of either a start tag, an end tag, or an empty element, and you can just put end tags on a stack and pop them off when you find their matching start tag. When you hit a start tag and the stack is empty, you're at the start of the parent element.

But this too is a process that might result in your backtracking all the way to the stream's origin, especially in the trivial case where the XML you're looking is the classically moronic XML log format:

<log>
   <entry>...</entry>
   <entry>...</entry>

...repeated ad infinitum


Sounds like XPathDocument might be what you are looking for. This class provides a fast, read-only, in-memory representation of an XML document. It doesn't build up a DOM and is optimized for XPath queries.

XPathDocument can also be used to parse XML fragments. To do so you have to create it from an XmlReader that has its conformance level set to fragment.

The following sample code first selects a set of XML nodes from an XML fragment and then selects the parent of each node based on an XPath expression:

using System;
using System.IO;
using System.Xml;
using System.Xml.XPath;

class Program
{
    static void Main(string[] args)
    {
        string xml = File.ReadAllText(@"C:\tmp\smplInput.xml");

        XmlReaderSettings xrs = new XmlReaderSettings();
        xrs.ConformanceLevel = ConformanceLevel.Fragment;

        using (TextReader textReader = new StringReader(xml))
        {
            using (XmlReader xmlReader = XmlReader.Create(textReader, xrs))
            {
                // Create a new XPathDocument   
                XPathDocument doc = new XPathDocument(xmlReader);

                // Create navigator   
                XPathNavigator navigator = doc.CreateNavigator();

                // Set up namespace manager for XPath   
                XmlNamespaceManager ns = new XmlNamespaceManager(navigator.NameTable);
                ns.AddNamespace("w", "http://www.example.com/2010/");

                // Select nodes  
                XPathNodeIterator users = navigator.Select("//w:user", ns);

                while (users.MoveNext())
                {
                    XPathNavigator user = users.Current;
                    XPathNavigator department = user.SelectSingleNode("parent::node()", ns);
                    Console.WriteLine(string.Format("User {0} is in department {1}",
                        user.GetAttribute("name", ns.DefaultNamespace),
                        department.GetAttribute("type", ns.DefaultNamespace)));
                }
            }
        }
    }
}

To try the code you could use the following XML input document:

<?xml version="1.0" encoding="utf-8" ?>
<w:departments xmlns:w="http://www.example.com/2010/">
  <w:department type="A">
    <w:user name="w" />
    <w:user name="x" />
    <w:department type="B">
      <w:user name="x" />
      <w:user name="y" />
    </w:department>
    <w:department type="C">
      <w:user name="x" />
      <w:user name="y" />
      <w:user name="z" />
    </w:department>
  </w:department>
  <w:department type="D">
    <w:user name="w" />
  </w:department>
</w:departments>


Another approach is to parse XML once, then generate XML index so next time you load the index and don't need to parse XML repeatedly... see the article below

http://xml.sys-con.com/node/453082


CAX from xponentsoftware does exactly what you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜