What is the relative processing speed of manipulating data with XML or OOP techniques? (i.e. XProc or XSL vs C# or Java)

2023-01-04 17:44 问答作者：

What is the difference in processing speed for executing a process using XML manipulation or using object-oriented representation? In general, is it faster to maximize or minimize the reliance on XML for a process. Let it be assumed that the code is highly optimized in either circumstance.

A simple example of what I am asking is which of the following would execute faster, when called from a C# web application, if the Thing in question were to represent the same qualified entity.

  //                            XSL CODE FRAGMENT
  <NewThings>
     <xsl:for-each select="Things/Thing">
        <xsl:copy-of select="." />
     </xsl:for-each>
  </NewThings>

  //                                C# Code Fragment
  void iterate(List<Thing> things){
      List<开发者_JAVA百科;Thing> newThings = new List<Thing>();
      things.ForEach(t=>newThings.Add(t));
  }

A complex example of might be whether it is faster to manipulate a system of objects and functions in C# or a system of xml documents in an XProc pipeline.

Thanks a lot.

Generally speaking, if you're only going to be using the source document's tree once, you're not going to gain much of anything by deserializing it into some specialized object model. The cost of admission - parsing the XML - is likely to dwarf the cost of using it, and any increase in performance that you get from representing the parsed XML in something more efficient than an XML node tree is going to be marginal.

If you're using the data in the source document over and over again, though, it can make a lot of sense to parse that data into some more efficiently-accessible structure. This is why XSLT has the xsl:key element and key() function: looking an XML node up in a hash table can be so much faster than performing a linear search on a list of XML nodes that it was worth putting the capability into the language.

To address your specific example, iterating over a List<Thing> is going to perform at the same speed as iterating over a List<XmlNode>. What will make the XSLT slower is not the iteration. It's the searching, and what you do with the found nodes. Executing the XPath query Things/Thing iterates through the child elements of the current node, does a string comparison to check each element's name, and if the element matches, it iterates through that element's child nodes and does another string comparison for each. (Actually, I don't know for a fact that it's doing a string comparison. For all I know, the XSLT processor has hashed the names in the source document and the XPath and is doing integer comparisons of hash values.) That's the expensive part of the operation, not the actual iteration over the resulting node set.

Additionally, most anything that you do with the resulting nodes in XSLT is going to involve linear searches through a node set. Accessing an object's property in C# doesn't. Accessing MyThing.MyProperty is going to be faster than getting at it via <xsl:value-of select='MyProperty'/>.

Generally, that doesn't matter, because parsing XML is expensive whether you deserialize it into a custom object model or an XmlDocument. But there's another case in which it may be relevant: if the source document is very large, and you only need a small part of it.

When you use XSLT, you essentially traverse the XML twice. First you create the source document tree in memory, and then the transform processes this tree. If you have to execute some kind of nasty XPath like //*[@some-attribute='some-value'] to find 200 elements in a million-element document, you're basically visiting each of those million nodes twice.

That's a scenario where it can be worth using an XmlReader instead of XSLT (or before you use XSLT). You can implement a method that traverses the stream of XML and tests each element to see if it's of interest, creating a source tree that contains only your interesting nodes. Or, if you want to get really crazy, you can implement a subclass of XmlReader that skips over uninteresting nodes, and pass that as the input to XslCompiledTemplate.Transform(). (I suspect, though, that if you knew enough about how XmlReader works to subclass it you probably wouldn't have needed to ask this question in the first place.) This approach allows you to visit 1,000,200 nodes instead of 2,000,000. It's also a king-hell pain in the ass, but sometimes art demands sacrifice from the artist.

All other things being equal, it's generally fastest to:

read the XML only once (disk I/O is slow)
build a document tree of nodes entirely in memory,
perform the transformations,
and generate the result.

That is, if you can represent the transformations as code operations on the in-node tree rather than having to read them from an XSLT description, that will definitely be faster. Either way, you'll have to generate some code that does the transformations you want, but with XSLT you have the extra step of "read in the transformations from this document and then transform the instructions into code", which tends to be a slow operation.

Your mileage may vary. You'll need to be more specific about the individual circumstances before a more precise answer can be given.

继续阅读：oop performance xml xslt

What is the relative processing speed of manipulating data with XML or OOP techniques? (i.e. XProc or XSL vs C# or Java)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？