开发者

Counting distinct items and parsing comma-delimited values using XSLT

Suppose I have XML like this:

<child_metadata>
    <metadata>
        <attributes>
            <metadata_valuelist value="[SampleItem3]"/>
        </attributes>
    </metadata>
    <metadata>
        <attributes>
            <metadata_valuelist value="[SampleItem1]"/>
        </attributes>
    </metadata>
    <metadata>
        <attributes>
            <metadata_valuelist value="[SampleItem1, SampleItem2]"/>
        </attributes>
    </metadata>
</child_metadata>

What I want to do is count the number of distinct values that are in the metadata_valuelists. There are the following distinct values: SampleItem1, SampleIt开发者_如何学Goem2, and SampleItem3. So, I want to get a value of 3. (Although SampleItem1 occurs twice, I only count it once.)

How can I do this in XSLT?

I realize there are two problems here: First, separating the comma-delimited values in the lists, and, second, counting the number of unique values. However, I'm not certain that I could combine solutions to the two problems, which is why I'm asking it as one question.


Another way without extension:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

    <xsl:variable name="all-value" select="/*/*/*/*/@value"/> 

    <xsl:template match="/"> 
        <xsl:variable name="count"> 
            <xsl:apply-templates select="$all-value"/> 
        </xsl:variable> 
        <xsl:value-of select="string-length($count)"/> 
    </xsl:template>  

    <xsl:template match="@value" name="value">
        <xsl:param name="meta" select="translate(.,'[] ','')"/>
        <xsl:choose>
            <xsl:when test="contains($meta,',')">
                <xsl:call-template name="value">
                    <xsl:with-param name="meta" select="substring-before($meta,',')"/>
                </xsl:call-template>
                <xsl:call-template name="value">
                    <xsl:with-param name="meta" select="substring-after($meta,',')"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:if test="count(.|$all-value[contains(translate(.,'[] ','&#xA;&#xA;&#xA;'),
                                                          concat('&#xA;',$meta,'&#xA;'))][1])=1">
                    <xsl:value-of select="1"/> 
                </xsl:if> 
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template> 

</xsl:stylesheet> 

Note: maybe can be optimize with xsl:key instead of xsl:variable Edit: Match tricky metadata.


This (note: just a single) transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:msxsl="urn:schemas-microsoft-com:xslt"
 >
 <xsl:output method="text"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kValue" match="value" use="."/>

 <xsl:template match="/">
   <xsl:variable name="vRTFPass1">
    <values>
     <xsl:apply-templates/>
    </values>
   </xsl:variable>

   <xsl:variable name="vPass1"
        select="msxsl:node-set($vRTFPass1)"/>

   <xsl:for-each select="$vPass1">
     <xsl:value-of select=
      "count(*/value[generate-id()
                    =
                     generate-id(key('kValue', .)[1])
                    ]
             )
      "/>
   </xsl:for-each>
 </xsl:template>

 <xsl:template match="metadata_valuelist">
  <xsl:call-template name="tokenize">
    <xsl:with-param name="pText" select="translate(@value, '[],', '')"/>
  </xsl:call-template>
 </xsl:template>

 <xsl:template name="tokenize">
    <xsl:param name="pText" />

    <xsl:choose>
      <xsl:when test="not(contains($pText, ' '))">
        <value><xsl:value-of select="$pText"/></value>
      </xsl:when>
      <xsl:otherwise>
        <value>
         <xsl:value-of select="substring-before($pText, ' ')"/>
        </value>
        <xsl:call-template name="tokenize">
         <xsl:with-param name="pText" select=
          "substring-after($pText, ' ')"/>
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<child_metadata>
    <metadata>
        <attributes>
            <metadata_valuelist value="[SampleItem3]"/>
        </attributes>
    </metadata>
    <metadata>
        <attributes>
            <metadata_valuelist value="[SampleItem1]"/>
        </attributes>
    </metadata>
    <metadata>
        <attributes>
            <metadata_valuelist value="[SampleItem1, SampleItem2]"/>
        </attributes>
    </metadata>
</child_metadata>

produces the wanted, correct result:

3

Do note: Because this is an XSLT 1.0 solution, it is necessary to convert the results of the first pass from the infamous RTF type to a regular tree. This is done using your XSLT 1.0 processor's xxx:node-set() function -- in my case I used msxsl:node-set().


You probably want to think about doing this in two stages; first, do a transform that breaks down these value attributes, then it's fairly trivial to count them.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@value">
    <xsl:call-template name="breakdown">
      <xsl:with-param name="itemlist" select="substring-before(substring-after(.,'['),']')" />
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="breakdown">
    <xsl:param name="itemlist" />
    <xsl:choose>
      <xsl:when test="contains($itemlist,',')">
        <xsl:element name="value">
          <xsl:value-of select="normalize-space(substring-before($itemlist,','))" />
        </xsl:element>
        <xsl:call-template name="breakdown">
          <xsl:with-param name="itemlist" select="substring-after($itemlist,',')" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:element name="value">
          <xsl:value-of select="normalize-space($itemlist)" />
        </xsl:element>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Aside from the 'catch all' template at the bottom, this picks up any value attributes in the format you gave, and breaks them down into separate elements (as sub-elements of the 'metadata_valuelist' element) like this:

...
<metadata_valuelist>
  <value>SampleItem1</value>
  <value>SampleItem2</value>
</metadata_valuelist>
...

The 'substring-before/substring-after select you see near the top strips off the '[' and ']' before passing it to the 'breakdown' template. This template will check if there's a comma in it's 'itemlist' parameter, and if there is it spits out the text before it as the content of a 'value' element, before recursively calling itself with the rest of the list. If there was no comma in the parameter, it just outputs the entire content of the parameter as a 'value' element.

Then just run this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" />

  <xsl:key name="itemvalue" match="value" use="text()" />

  <xsl:template match="/">
    <xsl:value-of select="count(//value[generate-id(.) = generate-id(key('itemvalue',.)[1])])" />
  </xsl:template>
</xsl:stylesheet>

on the XML you get from the first transform, and it'll just spit out a single value as text output that tells you how many distinct values you have.

EDIT: I should probably point out, this solution makes a few assumptions about your input:

  • There are no attributes named 'value' anywhere else in the document; if there are, you can modify the @value match to pick out these ones specifically.
  • There are no elements named 'value' anywhere else in the document; as the first transform creates them, the second will not be able to distinguish between the two. If there are, you can replace the two <xsl:element name="value"> lines with an element name that's not already used.
  • The content of the @value attribute always begins with '[' and ends with ']', and there are no ']' characters within the list; if there are, the 'substring-before' function will drop everything after the first ']', rather than just the ']' at the end.
  • There are no commas in the names of the items you want to count, e.g. [SampleItem1, "Sample2,3"]. If there are, '"Sample2' and '3"' would be treated as separate items.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜