开发者

How to use XSLT 1.0 or XPath to manipulate an HTML string

This is my problem: The code snippet below (inside the <xsl:choose>) does not reliably strip <p>, <div> or <br> tags out of a string using a combination of the substring-before() and substring() functions.

The string I'm trying to format is an attribute of a SharePoint SPS 2003 list item - text inputted via a rich text editor. What I ideally need is a catch-all <xsl:when> test that will always just grab the text within the string before a line break (effectively the first paragraph). I thought that:

<xsl:when test="contains(Story, '&#x0a;')='True'">

Would do that, but it doesn't always work as although the rich text editor inserts <br> and <p> tags, it appears that these are not always represented by the &#x0a; value.

Please help - this is driving me nuts. Code:

<xsl:choose>
  <xsl:when test="contains(Story, '&#x0a;')">
    <div>PTAG_OPEN_OR_BR<xsl:value-of select="substring-before(Story,'&#x0a;')" disable-output-escaping="yes"/></div>
  </xsl:when>
  <xsl:when test="contains(Story, '&#x0a;') and contains(Story, 'div>')">
    <div>DTAG<xsl:value-of select="substring-before(substring-after(substring-before(Story, '/div>'), 'div>'),'&#x0a;')" disable-output-escaping="yes"/></div>
  </xsl:when>
  <xsl:when test="contains(Story, '&#x0a;')!='True' and contains(Story, 'br>')">
    <div>BRTAG<xsl:value-of select="substring(Story, 1, string-length(substring-before(Story, 'br>')-1))" disable-output-escaping="yes"/></div>
  </xsl:when>            
  开发者_如何学Go<xsl:otherwise>
    <div>NO_TAG<xsl:value-of select="substring(Story, 1, 150)" disable-output-escaping="yes"/></div>
  </xsl:otherwise>
</xsl:choose>

EDIT:

Will try out your suggestion Tomalak. Thank you.

EDIT: 12/11/09

Only just had chance to try this out. Thanks for your help Tomalak - I have one question in regard to rendering this as html rather than xml. when I call the template removeMarkup, I get the following error message:

Exception: System.Xml.XmlException Message: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 120, position 58.

I'm not sure but I believe that this is because you can't have xslt tags inside other attributes? Is there any way around this?

Thanks Tim


A <p> or <br> is very probably represented by a <p> or <br> by the editor, not by &#x0a;. ;-)

Line break characters are not required anywhere in HTML, so if the editor decides not to include any line breaks, it's still fine. Relying on line breaks is an error on your part, IMHO.

Apart from that, without sample XML it is anybody's guess what XPath might do the trick for you.

EDIT:

I suggest a template that removes any HTML markup from a string (by recursive string processing). Then you can take the first meaningful bit of text from the result and print it out.

With this input:

<test>
  <Story>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</Story>
  <Story>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</Story>
  <Story>The quick brown fox jumped over the lazy dog.&lt;br&gt;The quick brown fox jumped over the lazy dog.</Story>
  <Story>The quick brown fox jumped over the lazy dog.</Story>
</test>

and this stylesheet:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="xml" encoding="utf-8" />

  <xsl:template match="Story">
    <xsl:copy>
      <original>
        <xsl:value-of select="." />
      </original>
      <processed>
        <xsl:variable name="result">
          <xsl:call-template name="removeMarkup">
            <xsl:with-param name="html" select="." />
          </xsl:call-template>
        </xsl:variable>
        <!-- select the bit of text before the '<>' delimiter -->
        <xsl:value-of select="substring-before($result, '&lt;&gt;')" />
      </processed>
    </xsl:copy>
  </xsl:template>

  <!-- this template removes all HTML markup (tags) from a string -->
  <xsl:template name="removeMarkup">
    <xsl:param name="html"  select="''" />
    <xsl:param name="inTag" select="false()" />

    <!-- if we are in a tag, we look for the next '>', otherwise for '<' -->    
    <xsl:variable name="lookFor">
      <xsl:choose>
        <xsl:when test="$inTag">&gt;</xsl:when>
        <xsl:otherwise>&lt;</xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <!-- split the input at the current delimiter char -->
    <xsl:variable name="head" select="substring-before(concat($html, '&lt;'), $lookFor)" />
    <xsl:variable name="tail" select="substring-after($html, $lookFor)" />

    <xsl:if test="not($inTag)">
      <xsl:value-of select="$head" />
      <!-- now add a uniqe delimiter after the first actual text -->
      <xsl:if test="translate(normalize-space($head), ' ', '') != ''">
        <xsl:value-of select="'&lt;&gt;'" /> <!-- '<>' as a delimiter -->
      </xsl:if>
    </xsl:if>

    <!-- remove markup for the rest of the string -->
    <xsl:if test="$tail != ''">
      <xsl:call-template name="removeMarkup">
        <xsl:with-param name="html"  select="$tail" />
        <xsl:with-param name="inTag" select="not($inTag)" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

the following result is produced:

<Story>
  <original>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</original>
  <processed>The quick brown fox jumped over the lazy dog</processed>
</Story>
<Story>
  <original>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</original>
  <processed>The quick brown fox jumped over the lazy dog</processed>
</Story>
<Story>
  <original>The quick brown fox jumped over the lazy dog.&lt;br&gt;The quick brown fox jumped over the lazy dog.</original>
  <processed>The quick brown fox jumped over the lazy dog.</processed>
</Story>
<Story>
  <original>The quick brown fox jumped over the lazy dog.</original>
  <processed>The quick brown fox jumped over the lazy dog.</processed>
</Story>

Disclaimer: As with all string processing over HTML input, this is not 100% fool proof and certain malformed input can break it.


contains() returns a boolean value, so contains(Story, ' ')='True' implies a casting operation. W3C XSLT specification is unclear about casting priority in comparison of string with boolean, so some XSLT processors will cast the boolean to string, and others will cast string to boolean. In the second case, string(True()) returns 'true' and not 'True'.

Anyway, your test is redundant, just use the boolean value returned by contains():

<xsl:when test="contains(Story, '&#x0a;')">
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜