开发者

replace special characters xml file

I receive an XML that I run through an XSLT process each day; however, the occasional special character causes this to break. I am looking for some utility that will clean the XML & replace special characters with correct html numeric encoding. Just need a utility or an idea.

Update from comments

The XML will sometimes include a special ch开发者_开发百科aracter such as ¢ rather than ¢ so I need a way to change the special character to the tag


If your XSLT code can't handle this input XML, then either the input isn't actually XML, or you're presenting it incorrectly to the XSLT processor. The most likely explanation is that the encoding of the file is not what the XML declaration at the start of the file says it is; or perhaps there isn't an XML declaration, so the processor is assuming UTF-8, but it's actually iso-8859-1. The solution may be as simple as adding an XML declaration to the start of the file to declare the encoding as iso-8859-1.


"Special" characters(Unicode characters not in ASCII) are valid XML, so you should really fix your parser. If that doesn't work, pipe your code through the following filter:

#!/usr/bin/env python

import sys

input = sys.stdin.read().decode('UTF-8')
for c in input:
    sys.stdout.write('&#%04d;' % ord(c) if c >= 128 else c)

Replace UTF-8 with the document's encoding. Save the above code to xmlentities, and call like

python xmlentities <broken.xml >fixed.xml


I can't reproduce this problem

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

With this input:

<t>¢</t>

Output:

<?xml version="1.0" encoding="UTF-16"?>
<t>¢</t>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜