开发者

XSLT 1.0 word count with HTML

I am looking to call a template that will trim down a field to 30 words. However, this field contains HTML and the HTML shou开发者_开发问答ld not count as a word.


Try this, although admittedly the translate call's a bit ugly:

<xsl:template match="field">
  <xsl:value-of select="string-length(translate(normalize-space(.),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',''))+1" />
</xsl:template>

This of course requires that the string in the translate call includes all characters that could appear in the field, other than spaces. It works by first calling normalize-space(.) to strip out both double-spaces and all but the text content. It then removes everything except spaces, counts the length of the resulting string and adds one. It does mean if you have <p>My<b>text</b> test</p> this will count as 2, as it will consider Mytext to be one word.

If you need a more robust solution, it's a little more convoluted:

<xsl:template match="field">
  <xsl:call-template name="countwords">
    <xsl:with-param name="text" select="normalize-space(.)" />
  </xsl:call-template>
</xsl:template>

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="contains($text,' ')">
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count + 1" />
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

This passes the result of normalize-space(.) into a recursive named template that calls itself when there's a space in $text, incrementing it's count parameter, and chopping off the first word each time using the substring-after($text,' ') call. If there's no space, then it treats $text as a single word, and just returns $count + 1 (+1 for the current word).

Bear in mind that this will include ALL text content within the field, including those within inner elements.

EDIT: Note to self: read the question properly, just noticed you needed more than just a word count. That's significantly more complicated to do if you want to include any xml tags, but a slight modification of the above is all it takes to spit out each word rather than simply count them:

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="$count = 30" />
    <xsl:when test="contains($text,' ')">
      <xsl:if test="$count != 0"><xsl:text>&#32;</xsl:text></xsl:if>
      <xsl:value-of select="substring-before($text,' ')" />
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count + 1" />
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="$text" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

There's an extra <xsl:when clause to simply stop recursing when count hits 30, and the recursive clause outputs the text, after adding a space at the beginning if it wasn't the first word.

EDIT: Ok, here's a solution that keeps the escaped XML content:

<xsl:template match="field">
  <xsl:call-template name="countwords">
    <xsl:with-param name="text" select="." />
  </xsl:call-template>
</xsl:template>

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="starts-with($text, '&lt;')">
      <xsl:value-of select="concat(substring-before($text,'&gt;'),'&gt;')" />
      <xsl:call-template name="countwords">
        <xsl:with-param name="count">
          <xsl:choose>
            <xsl:when test="starts-with(substring-after($text,'&gt;'),' ')"><xsl:value-of select="$count + 1" /></xsl:when>
            <xsl:otherwise><xsl:value-of select="$count" /></xsl:otherwise>
          </xsl:choose>
        </xsl:with-param>
        <xsl:with-param name="text" select="substring-after($text,'&gt;')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(contains($text, '&lt;') and contains($text, ' ') and string-length(substring-before($text,' ')) &lt; string-length(substring-before($text,'&lt;'))) or (contains($text,' ') and not(contains($text,'&lt;')))">
      <xsl:choose>
        <xsl:when test="$count &lt; 29"><xsl:value-of select="concat(substring-before($text, ' '),'&#32;')" /></xsl:when>
        <xsl:when test="$count = 29"><xsl:value-of select="substring-before($text, ' ')" /></xsl:when>
      </xsl:choose>
      <xsl:call-template name="countwords">
        <xsl:with-param name="count">
          <xsl:choose>
            <xsl:when test="normalize-space(substring-before($text, ' ')) = ''"><xsl:value-of select="$count" /></xsl:when>
            <xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
          </xsl:choose>
        </xsl:with-param>
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(contains($text, '&lt;') and contains($text, ' ') and string-length(substring-before($text,' ')) &gt; string-length(substring-before($text,'&lt;'))) or contains($text,'&lt;')">
      <xsl:if test="$count &lt; 30">
        <xsl:value-of select="substring-before($text, '&lt;')" />
      </xsl:if>
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count" />
        <xsl:with-param name="text" select="concat('&lt;',substring-after($text,'&lt;'))" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:if test="$count &lt; 30">
        <xsl:value-of select="$text" />
      </xsl:if>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

If you need any of it explained better, let me know, I'd rather not go into detail unless you need it!


Here's a slightly different approach:

If you can clean your input so that you get a normalised string of the text you want to word count, you can compare the string-length of the string with spaces to the string-length of the string with spaces removed. The difference should be your word count.

The word count function (template) will look something like this:

<xsl:template name="wordCount">
    <xsl:param name="input" required="yes"/>
    <xsl:param name="sep" select="'‒–—―'"/>
    <xsl:variable name="big"><xsl:value-of select="normalize-space(translate($input, $sep, ' '))"/></xsl:variable>
    <xsl:variable name="small"><xsl:value-of select="translate($big, ' ', '')"/></xsl:variable>
    <xsl:value-of select="string-length($big)-string-length($small)"/>
</xsl:template>

The $sep parameter allows you to define a list of any character(s) (as well as white-space) that you want to count as a word separator.

You can then use a sequence constructor when you call the template to build the string you want (I'll leave that as an exercise for the reader):

<xsl:call-template name="wordCount">
    <xsl:with-param name="input">
        <!-- templates etc to output text from html -->
    </xsl:with-param>
</xsl:call-template>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜