开发者

XSL - How to match consecutive comma-separated tags

I'm trying to match a series of xml tags that are comma separated, and to then apply an xslt transformation on the whole group of nodes plus text. For example, given the following partial XML:

<p>Some text here
    <xref id="1">1</xref>,
    <xref id="2">2</xref>,
    <xref id="3">3</xref>.
</p>

I would like to end up with:

<p>Some text here <sup>1,2,3</sup>.</p>

A much messier alternate would also be acceptable at this point:

<p>Some text here <sup>1</sup><sup>,</sup><sup>2</sup><sup>,</sup><sup>3</sup>.</p>

I have the transformation to go from a single xref to a sup:

<xsl:template match="xref"">
    <sup><xsl:apply-templates/>&l开发者_开发知识库t;/sup>
</xsl:template>

But I'm at a loss as to how to match a group of nodes separated by commas.

Thanks.


Update: Thanks to @Flynn1179 who alerted me that the solution wasn't producing exactly the wanted output, I have slightly modified it. Now the wanted "good" format is produced.

This XSLT 1.0 transformation:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes"/>

     <xsl:template match="node()|@*">
      <xsl:copy>
       <xsl:apply-templates select="node()[1]|@*"/>
      </xsl:copy>
      <xsl:apply-templates select="following-sibling::node()[1]"/>
     </xsl:template>

     <xsl:template match=
     "xref[not(preceding-sibling::node()[1]
                  [self::text() and starts-with(.,',')]
               )
          ]">

      <xsl:variable name="vBreakText" select=
      "following-sibling::text()[not(starts-with(.,','))][1]"/>

      <xsl:variable name="vPrecedingTheBreak" select=
       "$vBreakText/preceding-sibling::node()"/>

      <xsl:variable name="vFollowing" select=
      ".|following-sibling::node()"/>

      <xsl:variable name="vGroup" select=
      "$vFollowing[count(.|$vPrecedingTheBreak)
                  =
                   count($vPrecedingTheBreak)
                  ]
      "/>

      <sup>
       <xsl:apply-templates select="$vGroup" mode="group"/>
      </sup>
      <xsl:apply-templates select="$vBreakText"/>
     </xsl:template>

     <xsl:template match="text()" mode="group">
       <xsl:value-of select="normalize-space()"/>
     </xsl:template>
</xsl:stylesheet>

when applied on the following XML document (based on the provided one, but made more complex and interesting):

<p>Some text here    
    <xref id="1">1</xref>,    
    <xref id="2">2</xref>,    
    <xref id="3">3</xref>.
    <ttt/>
    <xref id="4">4</xref>,
    <xref id="5">5</xref>,
    <xref id="6">6</xref>.
    <zzz/>
</p>

produces exactly the wanted, correct result:

<p>Some text here        
    <sup>1,2,3</sup>.    
    <ttt/>
    <sup>4,5,6</sup>.    
    <zzz/>
</p>

Explanation:

  1. We use a "fined-grained" identity rule, which processes the document node-by node in document order and copies the matched node "as-is"

  2. We override the identity rule with a template that matches any xref element that is the first in a group of xref elements, each of which (but the last one) is followed by an immediate text-node-sibling that starts with the ',' character. Here we find the first text-node-sibling that breaks the rule (its starting character isn't ','.

  3. Then we find all the nodes in the group, using the Kayessian (after @Michael Kay) formula for the intersection of two nodesets. This formula is: $ns1[count(.|$ns2) = count($ns2)]

  4. Then we process all nodes in the group in a mode named "group".

  5. Finally, we apply templates (in anonymous mode) to the breaking text node (that is the first node following the group), so that the chain of processing continues.


Interesting question. +1.

Here's an XSLT 2.0 solution:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   exclude-result-prefixes="xs"
   version="2.0">
   <xsl:variable name="comma-regex">^\s*,\s*$</xsl:variable>

   <!-- Identity transform -->
   <xsl:template match="@* | node()">
      <xsl:copy>
         <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
   </xsl:template>

   <!-- Don't directly process xrefs that are second or later in a comma-separated series.
      Note that this template has a higher default priority than the following one,
      because of the predicate. -->
   <xsl:template match="xref[preceding-sibling::node()[1]/
      self::text()[matches(., $comma-regex)]/
      preceding-sibling::*[1]/self::xref]" />

   <!-- Don't directly process comma text nodes that are in the middle of a series. -->
   <xsl:template match="text()[matches(., $comma-regex) and
      preceding-sibling::*[1]/self::xref and following-sibling::*[1]/self::xref]" />

   <!-- for xrefs that first (or solitary) in a comma-separated series: -->
   <xsl:template match="xref">
      <sup>
         <xsl:call-template name="process-xref-series">
            <xsl:with-param name="next" select="." />
         </xsl:call-template>
      </sup>
   </xsl:template>

   <xsl:template name="process-xref-series">
      <xsl:param name="next"/>
      <xsl:if test="$next">
         <xsl:value-of select="$next"/>
         <xsl:variable name="followingXref"
            select="$next/following-sibling::node()[1]/
                     self::text()[matches(., $comma-regex)]/
                     following-sibling::*[1]/self::xref"/>
         <xsl:if test="$followingXref">
            <xsl:text>,</xsl:text>
            <xsl:call-template name="process-xref-series">
               <xsl:with-param name="next" select="$followingXref"/>
            </xsl:call-template>
         </xsl:if>         
      </xsl:if>

   </xsl:template>
</xsl:stylesheet>

(This could be simplified if we could make some assumptions about the input.)

Run against the sample input you gave, the result is:

<p>Some text here
   <sup>1,2,3</sup>.
</p>


The second alternative can be achieved with

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="p/text()[normalize-space() = ',' and preceding-sibling::node()[1][self::xref]]">
  <sup>,</sup>
</xsl:template>

<xsl:template match="xref">
  <sup>
    <xsl:apply-templates/>
  </sup>
</xsl:template>

</xsl:stylesheet>


There's an almost trivial solution to your 'messy alternative':

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="xref">
  <sup>
    <xsl:apply-templates />
  </sup>
</xsl:template>

<xsl:template match="text()[normalize-space(.)=',']">
  <sup>,</sup>
</xsl:template>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*| node()" />
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

EDIT: I just noticed it's almost a clone of Martin's solution, except without the additional check of a preceding xref element on the commas. His is probably safer :)

And a slightly less trivial solution to your preferred result, although this only works if you only have one collection of xref tags in any p tag. You didn't mention the possibility of more than one collection, and even if there are, I would have thought it unlikely they'd be within the same containing p tag. If that can happen though, it's possible to extend it further to allow for that, although it will get a lot more complicated.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="xref[not(preceding-sibling::text()[normalize-space(.)=','])]">
  <sup>
    <xsl:value-of select="." />
    <xsl:for-each select="following-sibling::text() | following-sibling::xref">
      <xsl:if test="following-sibling::text()[substring(.,1,1)='.']">
        <xsl:value-of select="normalize-space(.)" />
      </xsl:if>
    </xsl:for-each>
  </sup>
</xsl:template>

<xsl:template match="xref | text()[normalize-space(.)=',']" />

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*| node()" />
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>


In case you can use XSLT 2.0 (e.g. with Saxon 9 or AltovaXML or XQSharp) then here is an XSLT 2.0 solution that should produce the first output you asked for:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="p">
  <xsl:for-each-group select="node()" group-adjacent="self::xref or self::text()[normalize-space() = ',']">
    <xsl:choose>
      <xsl:when test="current-grouping-key()">
        <sup>
          <xsl:value-of select="current-group()/normalize-space()" separator=""/>
        </sup>
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-templates select="current-group()"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each-group>
</xsl:template>

</xsl:stylesheet>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜