Is XslCompiledTransform to blame for slow XML transformation for a large file?

2022-12-15 12:36 问答作者：

I am very new to XSLT, and the first thing that i need to do is parse a 300MB file (and that's开发者_如何学Go on the small end). The XSLT is not that complex for the moment, it's just removing some nodes that match a certain criteria. I have two problems:

It's too slow. It takes 50 seconds to process 500,000 records and that's not fast enough.
It consumes 500MBs of memory, so this will only get worse when the files will get bigger.

Is there anything i can do natively in .net to make is perform better?

I know I can look into SAX based parsing, or STX (which is mentioned in another post), but I would prefer to stay within the .net boundaries.

Thank you!

EDIT: Here's my XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:test="http://schemas....">
 <xsl:output omit-xml-declaration="yes"/>
    <xsl:template match="node()|@*">
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
</xsl:template>
<xsl:template match="test:QueryRow[test:Columns/test:QueryColumn[test:Name='hit_count' and test:Value>200]]"/>
</xsl:stylesheet>

Here's the code i use to do the transform

XslCompiledTransform compiledTransform = new XslCompiledTransform();
XsltSettings settings = new XsltSettings();
settings.EnableScript = true;
XmlReader xmlReader = XmlReader.Create("in.xml");
XmlWriter xmlWriter = XmlWriter.Create("out.xml");
compiledTransform.Load("format.xslt", settings, null);
compiledTransform.Transform(xmlReader, xmlWriter); //this is what takes a long time

At the moment I am trying to just read the file in, and write it back out, but it seems to actually be reading the whole file into memory, so I am trying to find a way to read it line by line.

Try profiling your XSLT. oXygen has a nice profiling capability that can tell you where the hot spots are in your transforms.

Is XslCompiledTransform to blame for slow XML transformation for a large file?

You could have some inefficient XPATH expressions (e.g. //*), or have logic buried inside of your templates(e.g. lots of for-each, if, choose, etc) that is preventing the XSLT engine from optimizing. Moving some of that logic up into the template match criteria can help the engine optimize and reduce the size of the node sets that you iterate over and evaluate.

The XPath expression you're filtering on doesn't have anything obviously wrong with it, as such. But it's easy to envision it being a problem. If your QueryRow elements all have 20 Column children, each of which has 20 QueryColumn children, the XSLT processor is going to have to examine 400 elements before deciding that a given QueryRow element doesn't match. That's conceivably pretty inefficient, because if it turns out that the element shouldn't be filtered, the XSLT processor then has to visit all 400 elements again to output them all.

The .NET way to implement SAX-like XML parsing is to subclass XmlReader, which you could conceivably do in this case: you basically build an XmlReader that buffers QueryRow elements as it reads their descendants until it determines that they're OK, and then returns them to the caller of the Read method. That's going to be considerably faster than using XSLT to filter the XML, since using an XmlReader doesn't require you to build an in-memory representation of the unfiltered XML document before you can filter it.

You could try checking out Saxon, which I hear is a very good and efficient XSLT processor. But the full XSLT is not possible to process in a streaming manner, even though your transform sounds like it could be, so unless the XSLT processor is very good at optimizing (as I understand, Saxon is one of the best, if not the best), your memory consumption problems may not be solvable.

继续阅读：.net parsing xml xslt

Is XslCompiledTransform to blame for slow XML transformation for a large file?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？