Counting distinct items and parsing comma-delimited values using XSLT
Suppose I have XML like this:
<child_metadata>
<metadata>
<attributes>
<metadata_valuelist value="[SampleItem3]"/>
</attributes>
</metadata>
<metadata>
<attributes>
<metadata_valuelist value="[SampleItem1]"/>
</attributes>
</metadata>
<metadata>
<attributes>
<metadata_valuelist value="[SampleItem1, SampleItem2]"/>
</attributes>
</metadata>
</child_metadata>
What I want to do is count the number of distinct values that are in the metadata_valuelists. There are the following distinct values: SampleItem1, SampleIt开发者_如何学Goem2, and SampleItem3. So, I want to get a value of 3. (Although SampleItem1 occurs twice, I only count it once.)
How can I do this in XSLT?
I realize there are two problems here: First, separating the comma-delimited values in the lists, and, second, counting the number of unique values. However, I'm not certain that I could combine solutions to the two problems, which is why I'm asking it as one question.
Another way without extension:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="all-value" select="/*/*/*/*/@value"/>
<xsl:template match="/">
<xsl:variable name="count">
<xsl:apply-templates select="$all-value"/>
</xsl:variable>
<xsl:value-of select="string-length($count)"/>
</xsl:template>
<xsl:template match="@value" name="value">
<xsl:param name="meta" select="translate(.,'[] ','')"/>
<xsl:choose>
<xsl:when test="contains($meta,',')">
<xsl:call-template name="value">
<xsl:with-param name="meta" select="substring-before($meta,',')"/>
</xsl:call-template>
<xsl:call-template name="value">
<xsl:with-param name="meta" select="substring-after($meta,',')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:if test="count(.|$all-value[contains(translate(.,'[] ','


'),
concat('
',$meta,'
'))][1])=1">
<xsl:value-of select="1"/>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Note: maybe can be optimize with xsl:key
instead of xsl:variable
Edit: Match tricky metadata.
This (note: just a single) transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
>
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kValue" match="value" use="."/>
<xsl:template match="/">
<xsl:variable name="vRTFPass1">
<values>
<xsl:apply-templates/>
</values>
</xsl:variable>
<xsl:variable name="vPass1"
select="msxsl:node-set($vRTFPass1)"/>
<xsl:for-each select="$vPass1">
<xsl:value-of select=
"count(*/value[generate-id()
=
generate-id(key('kValue', .)[1])
]
)
"/>
</xsl:for-each>
</xsl:template>
<xsl:template match="metadata_valuelist">
<xsl:call-template name="tokenize">
<xsl:with-param name="pText" select="translate(@value, '[],', '')"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="tokenize">
<xsl:param name="pText" />
<xsl:choose>
<xsl:when test="not(contains($pText, ' '))">
<value><xsl:value-of select="$pText"/></value>
</xsl:when>
<xsl:otherwise>
<value>
<xsl:value-of select="substring-before($pText, ' ')"/>
</value>
<xsl:call-template name="tokenize">
<xsl:with-param name="pText" select=
"substring-after($pText, ' ')"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<child_metadata>
<metadata>
<attributes>
<metadata_valuelist value="[SampleItem3]"/>
</attributes>
</metadata>
<metadata>
<attributes>
<metadata_valuelist value="[SampleItem1]"/>
</attributes>
</metadata>
<metadata>
<attributes>
<metadata_valuelist value="[SampleItem1, SampleItem2]"/>
</attributes>
</metadata>
</child_metadata>
produces the wanted, correct result:
3
Do note: Because this is an XSLT 1.0 solution, it is necessary to convert the results of the first pass from the infamous RTF type to a regular tree. This is done using your XSLT 1.0 processor's xxx:node-set() function -- in my case I used msxsl:node-set().
You probably want to think about doing this in two stages; first, do a transform that breaks down these value attributes, then it's fairly trivial to count them.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@value">
<xsl:call-template name="breakdown">
<xsl:with-param name="itemlist" select="substring-before(substring-after(.,'['),']')" />
</xsl:call-template>
</xsl:template>
<xsl:template name="breakdown">
<xsl:param name="itemlist" />
<xsl:choose>
<xsl:when test="contains($itemlist,',')">
<xsl:element name="value">
<xsl:value-of select="normalize-space(substring-before($itemlist,','))" />
</xsl:element>
<xsl:call-template name="breakdown">
<xsl:with-param name="itemlist" select="substring-after($itemlist,',')" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:element name="value">
<xsl:value-of select="normalize-space($itemlist)" />
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Aside from the 'catch all' template at the bottom, this picks up any value attributes in the format you gave, and breaks them down into separate elements (as sub-elements of the 'metadata_valuelist' element) like this:
...
<metadata_valuelist>
<value>SampleItem1</value>
<value>SampleItem2</value>
</metadata_valuelist>
...
The 'substring-before/substring-after select you see near the top strips off the '[' and ']' before passing it to the 'breakdown' template. This template will check if there's a comma in it's 'itemlist' parameter, and if there is it spits out the text before it as the content of a 'value' element, before recursively calling itself with the rest of the list. If there was no comma in the parameter, it just outputs the entire content of the parameter as a 'value' element.
Then just run this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:key name="itemvalue" match="value" use="text()" />
<xsl:template match="/">
<xsl:value-of select="count(//value[generate-id(.) = generate-id(key('itemvalue',.)[1])])" />
</xsl:template>
</xsl:stylesheet>
on the XML you get from the first transform, and it'll just spit out a single value as text output that tells you how many distinct values you have.
EDIT: I should probably point out, this solution makes a few assumptions about your input:
- There are no attributes named 'value' anywhere else in the document; if there are, you can modify the @value match to pick out these ones specifically.
- There are no elements named 'value' anywhere else in the document; as the first transform creates them, the second will not be able to distinguish between the two. If there are, you can replace the two
<xsl:element name="value">
lines with an element name that's not already used. - The content of the @value attribute always begins with '[' and ends with ']', and there are no ']' characters within the list; if there are, the 'substring-before' function will drop everything after the first ']', rather than just the ']' at the end.
- There are no commas in the names of the items you want to count, e.g. [SampleItem1, "Sample2,3"]. If there are, '"Sample2' and '3"' would be treated as separate items.
精彩评论