XML transform element appearing in wrong place in document
I am having some problems with an XML transform and need some help.
The stylesheet should iterate through all suffix elements and place the contents without the suffix tag next to the last text node within its first ancestor quote-block element (see desired ouput). It works when only a single suffix is present, but not when 2 are present, when 2 are present it places both suffixes next to each other in the last text node of the first quote-block.
Any ideas? I have tried limiting the selections to ancestor::quote-block[1] in various places but that doesn't have the desired effect.
Source XML
<paragraph>
<para>
<quote-block>
<list prefix-rules="specified">
<item prefix="“B42">
<para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
reached an agreement to negotiate towards a direct contract for coal haulage
by rail (on a DIY basis), which would replace the previous indirect E2E
arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
<quote-para>‘We did the deal with Edison Mission yesterday morning for
LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
pending a contract.</quote-para>
<quote-para><emphasis strength="strong">Enron are now off our hands so
far as Edison are concerned. The Enron flows we have left are to
British Energy’s station at Eggborough; from Immingham, Redcar
and Hull</emphasis>. Also to Enron’s own power station at Wilton
– 250,000 tonnes/year. I think we are stuck Enron [sic] on the
Eggborough traffic until next April when British Energy will,
hopefully take over their own coal procurement. <emphasis
strength="strong">But we have got them out of Fiddlers Ferry and
Ferrybridge – a big step forward</emphasis>.’</quote-para>
<suffix>(Emphasis added.)</suffix>
</quote-block>
</para>
</item>
<item prefix="B43">
<para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
EWS’s intent and, indeed, its success in stopping ECSL from carrying out
indirect supplies to EME, one of the new generating companies.”</para>
</item>
</list>
<suffix>(emphasis in original)</suffix>
</quote-block>
</para>
</paragraph>
Stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://xml.sm.com/schema/cases/report"
xmlns:sm="http://xml.sm.com/functions" xmlns:saxon="http://saxon.sf.net/"
xpath-default-namespace="http://sm.com/schema/cases/report"
exclude-result-prefixes="xs sm" version="2.0">
<xsl:output method="xml" indent="no"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<!-- Match quote-blocks with open or close attributes. -->
<xsl:template match="*[*:quote-block and descendant::*:suffix]">
<xsl:call-template name="process-quote-block"/>
</xsl:template>
<!-- Match inline quote with open or close attributes -->
<xsl:template match="*[*:quote and descendant::*:suffix]">
<xsl:call-template name="process-quote-block"/>
</xsl:template>
<!-- Process the quote block -->
<xsl:template name="process-quote-block">
<xsl:variable name="quoteBlockCopy">
<xsl:copy-of select="."/>
</xsl:variable>
<xsl:apply-templates select="$quoteBlockCopy" mode="append-suffix">
<xsl:with-param name="suffix" select="sm:get-suffix-note(.)"/>
<xsl:with-param name="end-node" select="sm:get-last-text-node($quoteBlockCopy)"/>
</xsl:apply-templates>
</xsl:template>
<!-- Match quote-blocks with open or close attributes. -->
<xsl:template match="*[*:quote-block and descendant::*:suffix][ancestor::*:quote-block[1]]" mode="create-copy">
<xsl:call-template name="process-quote-block"/>
</xsl:template>
<!-- Match inline quote with open or close attributes -->
<xsl:template match="*[*:quote and descendant::*:suffix]" mode="create-copy">
<xsl:call-template name="process-quote-block"/>
</xsl:template>
<!-- This will match all elements. Just copy and pass through the parameters. -->
<xsl:template match="*" mode="append-suffix">
<xsl:param name="suffix"/>
<xsl:param name="end-node"/>
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates mode="append-suffix">
<xsl:with-param name="suffix" select="$suffix"/>
<xsl:with-param name="end-node" select="$end-node"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<!-- Apply the text node to the content. If the node is equal to the last node then append the descendants of suffix -->
<xsl:template match="text()[normalize-space() != '']" mode="append-suffix">
<xsl:param name="suffix"/>
<xsl:param name="end-node"/>
<xsl:choose>
<xsl:when test="count(. | $end-node) = 1">
<xsl:value-of select="."/>
<xsl:apply-templates select="$suffix"/>
</xsl:when>
<xsl:otherwise>
<!-- Or maybe neither. -->
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- Dont copy suffix as -->
<xsl:template match="*:suffix" mode="append-suffix"/>
<xsl:function name="sm:get-suffix-note">
<xsl:param name="node"/>
<xsl:sequence select="$node/descendant::*:suffix/node()"/>
</xsl:function>
<xsl:function name="sm:get-last-text-node">
<!-- Finds last non-empty text() node, ignoring <suffix> elements that are a child of this specific quote-block. -->
<xsl:param name="node"/>
<xsl:sequence
select="reverse($node//text()[not(ancestor::*:suffix) and normalize-space() != ''])[1]"/>
</xsl:function>
</xsl:stylesheet>
Current Output XML
<paragraph>
<para>
<quote-block>
<list prefix-rules="specified">
<item prefix="“B42">
<para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
reached an agreement to negotiate towards a direct contract for coal haulage
by rail (on a DIY basis), which would replace the previous indirect E2E
arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
<quote-para>‘We did the deal with Edison Mission yesterday morning for
LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
pending a contract.</quote-para>
<quote-para><emphasis strength="strong">Enron are now off our hands so
far as Edison are concerned. The Enron flows we have left are to
British Energy’s station at Eggborough; from Immingham, Redcar
and Hull</emphasis>. Also to Enron’s own power station at Wilton
– 250,000 tonnes/year. I think we are stuck Enron [sic] on the
Eggborough traffic until next April when British Energy will,
hopefully take over their own coal procurement. <emphasis
strength="strong">But we have got them out of Fiddlers Ferry and
Ferrybridge – a big step forward</emphasis>.’</quote-para>
</quote-block>
</para>
</item>
<item prefix="B43">
<para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
EWS’s intent and, 开发者_开发问答indeed, its success in stopping ECSL from carrying out
indirect supplies to EME, one of the new generating companies.”(Emphasis
added.)(emphasis in original)</para>
</item>
</list>
</quote-block>
</para>
</paragraph>
Desired Ouput
<paragraph>
<para>
<quote-block>
<list prefix-rules="specified">
<item prefix="“B42">
<para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
reached an agreement to negotiate towards a direct contract for coal haulage
by rail (on a DIY basis), which would replace the previous indirect E2E
arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
<quote-para>‘We did the deal with Edison Mission yesterday morning for
LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
pending a contract.</quote-para>
<quote-para><emphasis strength="strong">Enron are now off our hands so
far as Edison are concerned. The Enron flows we have left are to
British Energy’s station at Eggborough; from Immingham, Redcar
and Hull</emphasis>. Also to Enron’s own power station at Wilton
– 250,000 tonnes/year. I think we are stuck Enron [sic] on the
Eggborough traffic until next April when British Energy will,
hopefully take over their own coal procurement. <emphasis
strength="strong">But we have got them out of Fiddlers Ferry and
Ferrybridge – a big step forward</emphasis>.’(Emphasis
added.)</quote-para>
</quote-block>
</para>
</item>
<item prefix="B43">
<para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
EWS’s intent and, indeed, its success in stopping ECSL from carrying out
indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para>
</item>
</list>
</quote-block>
</para>
</paragraph>
Man, you've dug yourself into quite a hole here. ;-) Here is what I have come up with:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="xml" encoding="utf-8" indent="no"/>
<!-- key to identify all non-empty, non-suffix text node descendants of
a quote-block. We'll use that to pull out the "last one" later-on -->
<xsl:key
name ="kQbText"
match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]"
use ="generate-id(ancestor::quote-block[1])"
/>
<!-- identity template to copy everything that is not otherwise handled -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*" />
</xsl:copy>
</xsl:template>
<!-- special handling for text nodes that are descendants of quote-blocks -->
<xsl:template match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]">
<xsl:variable name="qb" select="ancestor::quote-block[1]" />
<!-- the text node gets copied regardless -->
<xsl:copy-of select="." />
<!-- if it is the last non-empty text node, append all suffices -->
<xsl:if test="
generate-id()
=
generate-id( key('kQbText', generate-id($qb))[last()] )
">
<xsl:for-each select="$qb/suffix">
<xsl:value-of select="concat(' ', .)" />
</xsl:for-each>
</xsl:if>
</xsl:template>
<!-- empty text nodes will be removed (all others are copied) -->
<xsl:template match="text()[normalize-space() = '']" />
<!-- suffix nodes will be deleted-->
<xsl:template match="suffix" />
</xsl:stylesheet>
The above results in (indentation and line-breaks added with tidy to make it readable):
<paragraph>
<para>
<quote-block>
<list prefix-rules="specified">
<item prefix="“B42">
<para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June
2000, EME and EWS reached an agreement to negotiate
towards a direct contract for coal haulage by rail (on a
DIY basis), which would replace the previous indirect E2E
arrangements that EME had in place with ECSL. An internal
EWS e-mail noted:
<quote-block>
<quote-para>‘We did the deal with Edison Mission
yesterday morning for LBT-Fiddlers @ £[…]/tonne as
agreed. This rate until 16th September pending a
contract.</quote-para>
<quote-para>
<emphasis strength="strong">Enron are now off our hands
so far as Edison are concerned. The Enron flows we have
left are to British Energy’s station at Eggborough;
from Immingham, Redcar and Hull</emphasis>. Also to
Enron’s own power station at Wilton – 250,000
tonnes/year. I think we are stuck Enron [sic] on the
Eggborough traffic until next April when British Energy
will, hopefully take over their own coal procurement.
<emphasis strength="strong">But we have got them out of
Fiddlers Ferry and Ferrybridge – a big step
forward</emphasis>.’ (Emphasis added.)</quote-para>
</quote-block></para>
</item>
<item prefix="B43">
<para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This
e-mail is evidence of both EWS’s intent and, indeed, its
success in stopping ECSL from carrying out indirect
supplies to EME, one of the new generating companies.”
(emphasis in original)</para>
</item>
</list>
</quote-block>
</para>
</paragraph>
The XSLT code here is XSLT 1.0, but you can run it unaltered in a 2.0 processor.
Here is a simple transform that adresses just the problem. As others have noticed, the problem is specified in a very messy way and does not allow a single, unambiguous interpretation.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:key name="kLastNonSufText"
match="*[not(self::suffix)]/text()"
use="generate-id(ancestor::quote-block[1])"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[ancestor::quote-block]">
<xsl:copy-of select="."/>
<xsl:variable name="vQBImmed" select="ancestor::quote-block[1]"/>
<xsl:variable name="vLastText" select=
"key('kLastNonSufText', generate-id($vQBImmed))
[last()]"/>
<xsl:if test="count(.|$vLastText) = 1">
<xsl:copy-of select="($vQBImmed//suffix)[last()]/text()"/>
</xsl:if>
</xsl:template>
<xsl:template match="suffix"/>
</xsl:stylesheet>
When this transformation is applied on the (very unreadable and poorly formatted) provided source XML document:
<paragraph>
<para>
<quote-block>
<list prefix-rules="specified">
<item prefix="“B42">
<para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
reached an agreement to negotiate towards a direct contract for coal haulage
by rail (on a DIY basis), which would replace the previous indirect E2E
arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
<quote-block>
<quote-para>‘We did the deal with Edison Mission yesterday morning for
LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
pending a contract.</quote-para>
<quote-para>
<emphasis strength="strong">Enron are now off our hands so
far as Edison are concerned. The Enron flows we have left are to
British Energy’s station at Eggborough; from Immingham, Redcar
and Hull</emphasis>. Also to Enron’s own power station at Wilton
– 250,000 tonnes/year. I think we are stuck Enron [sic] on the
Eggborough traffic until next April when British Energy will,
hopefully take over their own coal procurement.
<emphasis
strength="strong">But we have got them out of Fiddlers Ferry and
Ferrybridge – a big step forward</emphasis>.’
</quote-para>
<suffix>(Emphasis added.)</suffix>
</quote-block>
</para>
</item>
<item prefix="B43">
<para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
EWS’s intent and, indeed, its success in stopping ECSL from carrying out
indirect supplies to EME, one of the new generating companies.”</para>
</item>
</list>
<suffix>(emphasis in original)</suffix>
</quote-block>
</para>
</paragraph>
the output has the desired suffixes appended to the desired text nodes:
<?xml version="1.0" encoding="UTF-16"?><paragraph><para><quote-block><list prefix-rules="specified"><item prefix="“B42"><para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
reached an agreement to negotiate towards a direct contract for coal haulage
by rail (on a DIY basis), which would replace the previous indirect E2E
arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
<quote-block><quote-para>‘We did the deal with Edison Mission yesterday morning for
LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
pending a contract.</quote-para><quote-para><emphasis strength="strong">Enron are now off our hands so
far as Edison are concerned. The Enron flows we have left are to
British Energy’s station at Eggborough; from Immingham, Redcar
and Hull</emphasis>. Also to Enron’s own power station at Wilton
– 250,000 tonnes/year. I think we are stuck Enron [sic] on the
Eggborough traffic until next April when British Energy will,
hopefully take over their own coal procurement.
<emphasis strength="strong">But we have got them out of Fiddlers Ferry and
Ferrybridge – a big step forward</emphasis>.’
(Emphasis added.)</quote-para></quote-block></para></item><item prefix="B43"><para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
EWS’s intent and, indeed, its success in stopping ECSL from carrying out
indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para></item></list></quote-block></para></paragraph>
精彩评论