开发者

How to display non-ASCII characters from a XML output

I get this output in a XML element:

£111.00

It should be £111.00.

How can i sort this out so that all unicode characters ar开发者_Python百科e displayed rather than the code. I am using linux tool wget to fetch the xml file from the Internet. Perhaps some sort of convertor?

I am viewing the file in putty , i am parsing the file and i want to clean the input before parsing.

I am using xml_grep2 to get the elements i want and then cat filename | while read .....


Ok i'm going to close this question now.

After parsing the file with xml_grep2 i was able to get a clean output however was seeing this à character in the file. I changed putty settings for character set to UTF-8 from ISO-8859 to resolve that.


You can use HTML::Entities to replace the entities with literal character codes. I don't know how good its coverage is, though. There are bound to be similar tools for other languages if you are not comfortable with Perl. http://metacpan.org/pod/HTML::Entities

sh$ echo '£111.00' | perl -CSD -MHTML::Entities -pe 'decode_entities($_)'
£111.00    

This won't work if the HTML::Entities module is not installed. If you need to install it, there are numerous tutorials about the CPAN on the Internet.

Edit: Add usage example. The -CSD option might not be necessary on your system, but on OSX at least, I got garbage output without it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜