XSLT 1.0 word count with HTML
I am looking to call a template that will trim down a field to 30 words. However, this field contains HTML and the HTML shou开发者_开发问答ld not count as a word.
Try this, although admittedly the translate call's a bit ugly:
<xsl:template match="field">
<xsl:value-of select="string-length(translate(normalize-space(.),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',''))+1" />
</xsl:template>
This of course requires that the string in the translate call includes all characters that could appear in the field, other than spaces. It works by first calling normalize-space(.)
to strip out both double-spaces and all but the text content. It then removes everything except spaces, counts the length of the resulting string and adds one. It does mean if you have <p>My<b>text</b> test</p>
this will count as 2, as it will consider Mytext
to be one word.
If you need a more robust solution, it's a little more convoluted:
<xsl:template match="field">
<xsl:call-template name="countwords">
<xsl:with-param name="text" select="normalize-space(.)" />
</xsl:call-template>
</xsl:template>
<xsl:template name="countwords">
<xsl:param name="count" select="0" />
<xsl:param name="text" />
<xsl:choose>
<xsl:when test="contains($text,' ')">
<xsl:call-template name="countwords">
<xsl:with-param name="count" select="$count + 1" />
<xsl:with-param name="text" select="substring-after($text,' ')" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
</xsl:choose>
</xsl:template>
This passes the result of normalize-space(.)
into a recursive named template that calls itself when there's a space in $text
, incrementing it's count
parameter, and chopping off the first word each time using the substring-after($text,' ')
call. If there's no space, then it treats $text
as a single word, and just returns $count + 1
(+1 for the current word).
Bear in mind that this will include ALL text content within the field, including those within inner elements.
EDIT: Note to self: read the question properly, just noticed you needed more than just a word count. That's significantly more complicated to do if you want to include any xml tags, but a slight modification of the above is all it takes to spit out each word rather than simply count them:
<xsl:template name="countwords">
<xsl:param name="count" select="0" />
<xsl:param name="text" />
<xsl:choose>
<xsl:when test="$count = 30" />
<xsl:when test="contains($text,' ')">
<xsl:if test="$count != 0"><xsl:text> </xsl:text></xsl:if>
<xsl:value-of select="substring-before($text,' ')" />
<xsl:call-template name="countwords">
<xsl:with-param name="count" select="$count + 1" />
<xsl:with-param name="text" select="substring-after($text,' ')" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise><xsl:value-of select="$text" /></xsl:otherwise>
</xsl:choose>
</xsl:template>
There's an extra <xsl:when
clause to simply stop recursing when count hits 30, and the recursive clause outputs the text, after adding a space at the beginning if it wasn't the first word.
EDIT: Ok, here's a solution that keeps the escaped XML content:
<xsl:template match="field">
<xsl:call-template name="countwords">
<xsl:with-param name="text" select="." />
</xsl:call-template>
</xsl:template>
<xsl:template name="countwords">
<xsl:param name="count" select="0" />
<xsl:param name="text" />
<xsl:choose>
<xsl:when test="starts-with($text, '<')">
<xsl:value-of select="concat(substring-before($text,'>'),'>')" />
<xsl:call-template name="countwords">
<xsl:with-param name="count">
<xsl:choose>
<xsl:when test="starts-with(substring-after($text,'>'),' ')"><xsl:value-of select="$count + 1" /></xsl:when>
<xsl:otherwise><xsl:value-of select="$count" /></xsl:otherwise>
</xsl:choose>
</xsl:with-param>
<xsl:with-param name="text" select="substring-after($text,'>')" />
</xsl:call-template>
</xsl:when>
<xsl:when test="(contains($text, '<') and contains($text, ' ') and string-length(substring-before($text,' ')) < string-length(substring-before($text,'<'))) or (contains($text,' ') and not(contains($text,'<')))">
<xsl:choose>
<xsl:when test="$count < 29"><xsl:value-of select="concat(substring-before($text, ' '),' ')" /></xsl:when>
<xsl:when test="$count = 29"><xsl:value-of select="substring-before($text, ' ')" /></xsl:when>
</xsl:choose>
<xsl:call-template name="countwords">
<xsl:with-param name="count">
<xsl:choose>
<xsl:when test="normalize-space(substring-before($text, ' ')) = ''"><xsl:value-of select="$count" /></xsl:when>
<xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
</xsl:choose>
</xsl:with-param>
<xsl:with-param name="text" select="substring-after($text,' ')" />
</xsl:call-template>
</xsl:when>
<xsl:when test="(contains($text, '<') and contains($text, ' ') and string-length(substring-before($text,' ')) > string-length(substring-before($text,'<'))) or contains($text,'<')">
<xsl:if test="$count < 30">
<xsl:value-of select="substring-before($text, '<')" />
</xsl:if>
<xsl:call-template name="countwords">
<xsl:with-param name="count" select="$count" />
<xsl:with-param name="text" select="concat('<',substring-after($text,'<'))" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:if test="$count < 30">
<xsl:value-of select="$text" />
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
If you need any of it explained better, let me know, I'd rather not go into detail unless you need it!
Here's a slightly different approach:
If you can clean your input so that you get a normalised string of the text you want to word count, you can compare the string-length of the string with spaces to the string-length of the string with spaces removed. The difference should be your word count.
The word count function (template) will look something like this:
<xsl:template name="wordCount">
<xsl:param name="input" required="yes"/>
<xsl:param name="sep" select="'‒–—―'"/>
<xsl:variable name="big"><xsl:value-of select="normalize-space(translate($input, $sep, ' '))"/></xsl:variable>
<xsl:variable name="small"><xsl:value-of select="translate($big, ' ', '')"/></xsl:variable>
<xsl:value-of select="string-length($big)-string-length($small)"/>
</xsl:template>
The $sep parameter allows you to define a list of any character(s) (as well as white-space) that you want to count as a word separator.
You can then use a sequence constructor when you call the template to build the string you want (I'll leave that as an exercise for the reader):
<xsl:call-template name="wordCount">
<xsl:with-param name="input">
<!-- templates etc to output text from html -->
</xsl:with-param>
</xsl:call-template>
精彩评论