Data loss when converting UTF-8 XML to Latin-1?
If I convert a UTF-8-encoded XML document (which has an XML pr开发者_高级运维olog declaring the encoding to be UTF-8) to Latin-1 using xmllint, will there be any data loss?
xmllint --encode iso-8859-1 --output test-latin1.xml test-utf8.xml
(the data will eventually be displayed as ISO-8859-1-encoded HTML)
There will be a problem if there are any unicode characters outside Latin1 in your original xml file. But I suspect xmllint will detect that and refuse to do the the translation.
The only case I can think of where you might get interesting conversions is if the file contains accented characters - unicode has multiple ways of representing them, which might be all mapped to the single representation in Latin1.
If there is dataloss depends on the contents of the file. If all characters in it belong to the iso-8859-1 subset, it'll be ok. If it contains other characters, e.g. from the Cyrillic alphabet or Old Italian, you will lose them. xmllint indicates that (with an error code).
I converted it back to UTF-8 again and the file seems to be identical to the original, so it looks it's ok.
xmllint --encode utf-8 --output test-utf8-post.xml test-latin1.xml
精彩评论