开发者

How to escape special characters when transforming html to text using xsl?

Sample html :

<html>
<head>
    <title>My Headline</title>
    <meta name="targetUrl" value="xyz.html?sym=abc"/>
    <meta name="summary" value="A & B"/>
</head>
    <body>
        abc abc, pqr, xyz, rst tsd, prrrr, qqqqqqq, oooooo, opop opop, rtrttrt rtrtrtrt
    </body>
</html>

This is just an sample html and it could be any random special chracters and I dont have access to htmls. I tried using following xsl but it doesnt work

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no" omit-xml-declaration="yes"/>
<xsl:strip-space elements="xsl:text"/>
<xsl:variable name="delimiter" select="'|'"/>
<xsl:variable name="fieldNames" select="'yes'"/>
        <xsl:template match="/">
               开发者_如何学运维 <xsl:if test="$fieldNames = 'yes'">
                        <xsl:text>title</xsl:text>
                        <xsl:value-of select="$delimiter"/>
                        <xsl:text>targetURL</xsl:text>
                        <xsl:value-of select="$delimiter"/>
                        <xsl:text>summary-r</xsl:text>
                        <xsl:value-of select="$delimiter"/>
                        <xsl:text>body</xsl:text>
                        <xsl:text>&#xA;</xsl:text>
                </xsl:if>
                <xsl:value-of select="normalize-space(html/head/title)" disable-output-escaping="yes" />
                <xsl:value-of select="$delimiter"/>
                <xsl:value-of select="html/head/meta[@name='targetURL']/@value" disable-output-escaping="yes" />
                <xsl:value-of select="$delimiter"/>
                <xsl:value-of select="html/head/meta[@name='summary-r']/@value" disable-output-escaping="yes" />
                <xsl:value-of select="$delimiter"/>
                <xsl:value-of select="normalize-space(html/body)" disable-output-escaping="yes" />
        </xsl:template>
</xsl:stylesheet>

Any help is appreciated.


XSLT will only work on XML input, not on HTML. So you need to start by running an HTML to XML conversion before you do the XSLT transformation. There are plenty of tools to do this, e.g. John Cowan's TagSoup.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜