Improving the performance of XSL
I am using the below XSL 2.0 code to find the ids of the text nodes that contains the list of indices that i give as input. the code works perfectly but in terms for performance it is taking a long time for huge files. Even for 开发者_如何学JAVAhuge files if the index values are small then the result is quick in few ms. I am using saxon9he Java processor to execute the XSL.
<xsl:variable name="insert-data" as="element(data)*">
<xsl:for-each-group
select="doc($insert-file)/insert-data/data"
group-by="xsd:integer(@index)">
<xsl:sort select="current-grouping-key()"/>
<data
index="{current-grouping-key()}"
text-id="{generate-id(
$main-root/descendant::text()[
sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()
][1]
)}">
<xsl:copy-of select="current-group()/node()"/>
</data>
</xsl:for-each-group>
</xsl:variable>
In the above solution if the index value is too huge say 270962 then the time taken for the XSL to execute is 83427ms. In huge files if the index value is huge say 4605415, 4605431 it takes several minutes to execute. Seems the computation of the variable "insert-data" takes time though it is a global variable and computed only once. Should the XSL be addessed or the processor? How can i improve the performance of the XSL.
I'd guess the problem is the generation of text-id
, i.e. the expression
generate-id( $main-root/descendant::text()[ sum((preceding::text(), .)/string-length(.)) ge current-grouping-key() ][1] )
You are potentially recalculating a lot of sums here. I think the easiest path here would be to invert your approach: recurse across the text nodes in the document, aggregate the string length so far, and output data
elements each time a new @index
is reached. The following example illustrates the approach. Note that each unique @index
and each text node is visited only once.
<xsl:variable name="insert-doc" select="doc($insert-file)"/>
<xsl:variable name="insert-data" as="element(data)*">
<xsl:call-template name="calculate-data"/>
</xsl:variable>
<xsl:key name="index" match="data" use="xsd:integer(@index)"/>
<xsl:template name="calculate-data">
<xsl:param name="text-nodes" select="$main-root//text()"/>
<xsl:param name="previous-lengths" select="0"/>
<xsl:param name="indexes" as="xsd:integer*">
<xsl:perform-sort
select="distinct-values(
$insert-doc/insert-data/data/@index/xsd:integer(.))">
<xsl:sort/>
</xsl:perform-sort>
</xsl:param>
<xsl:if test="$text-nodes">
<xsl:variable name="total-lengths"
select="$previous-lengths + string-length($text-nodes[1])"/>
<xsl:choose>
<xsl:when test="$total-lengths ge number($indexes[1])">
<data
index="{$indexes[1]}"
text-id="{generate-id($text-nodes[1])}">
<xsl:copy-of select="key('index', $indexes[1],
$insert-doc)"/>
</data>
<!-- Recursively move to the next index. -->
<xsl:call-template name="calculate-data">
<xsl:with-param
name="text-nodes"
select="$text-nodes"/>
<xsl:with-param
name="previous-lengths"
select="$previous-lengths"/>
<xsl:with-param
name="indexes"
select="subsequence($indexes, 2)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<!-- Recursively move to the text node. -->
<xsl:call-template name="calculate-data">
<xsl:with-param
name="text-nodes"
select="subsequence($text-nodes, 2)"/>
<xsl:with-param
name="previous-lengths"
select="$total-lengths"/>
<xsl:with-param
name="indexes"
select="$indexes"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>
精彩评论