XSLT Transform with Character 8221

2023-02-06 17:35 问答作者：

I'm transforming an XML document using javax.xml.transform.Transformer and XSLT. The document contains the characters “ and 开发者_JS百科” (Java Integer Code 8220 and 8221). These are not the normal quotation marks.

When I transform the document, these characters are transformed into  and  Now, my struggle is how to convert these back into something that people can read? I tried reading the document with DOMReader and SAXReader using encodings utf-8,utf-16, ascii, etc. No luck.

Your help is very much appreciated. Max.

These are utf-8 characters 201c and 201d. Are you transforming to HTML? If so and your xslt specifies HTML output I'd expect it to output &ldquo and &rldquo, as they're character entity references: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references Quote from the XSLT spec:

"The html output method may output a character using a character entity reference, if one is defined for it in the version of HTML that the output method is using."

http://www.w3.org/TR/xslt#section-HTML-Output-Method

This input:

<p> “ and ” </p>

With this stylesheet (just identity rule):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output:

<p> “ and ” </p>

Only Xalan with html serialization method, output:

<p> &ldquo; and &rdquo; </p>

So, if you want a proper renderization you need to output a proper HTML document...

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" encoding="utf-8"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>
    <xsl:template match="/">
        <html>
            <head>
                <title>Test</title>
            </head>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

Output:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>Test</title>
    </head>
    <body>
        <p> “ and ” </p>
    </body>
</html>

Note: Proper charset encoding declaration.

You need to understand that XSL transformation is applied not to the XML document per se but rather to tree representation of this document(s). Text nodes contain values in particular encoding regardless of how they were represented in input document - after tree is built they are same. During transformation you just create another tree and then it's serialized.

Some of characters like ones that you mentioned require special treatment depending on what destination format you choose. In case of serialization to XML document they are "escaped" and in case of serialization to HTML they are not. This is why first answer gives you a workaround.

However difference between these two methods in regard of escaping is just in the default value for "disable-output-escaping" attribute (XSLT 1.0). In case of XML output it's set to "no" and in case of HTML it's set to "yes".

So in order to fix your issue without changing the whole serialization method you could write something like this when you're copying some value which might contain "special" characters:

<xsl:value-of select="/my/node/text()" disable-output-escaping="yes"/>

P.S. In XSLT 2.0 preferred way to do this kind of things is by using character-map instruction.

继续阅读：html-entities xml xslt

XSLT Transform with Character 8221

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？