How to display non-ASCII characters from a XML output
I get this output in a XML element:
£111.00
It should be £111.00
.
How can i sort this out so that all unicode characters ar开发者_Python百科e displayed rather than the code. I am using linux tool wget to fetch the xml file from the Internet. Perhaps some sort of convertor?
I am viewing the file in putty , i am parsing the file and i want to clean the input before parsing.
I am using xml_grep2 to get the elements i want and then cat filename | while read .....
Ok i'm going to close this question now.
After parsing the file with xml_grep2 i was able to get a clean output however was seeing this à character in the file. I changed putty settings for character set to UTF-8 from ISO-8859 to resolve that.
You can use HTML::Entities to replace the entities with literal character codes. I don't know how good its coverage is, though. There are bound to be similar tools for other languages if you are not comfortable with Perl. http://metacpan.org/pod/HTML::Entities
sh$ echo '£111.00' | perl -CSD -MHTML::Entities -pe 'decode_entities($_)'
£111.00
This won't work if the HTML::Entities module is not installed. If you need to install it, there are numerous tutorials about the CPAN on the Internet.
Edit: Add usage example. The -CSD
option might not be necessary on your system, but on OSX at least, I got garbage output without it.
精彩评论