UnicodeEncodeError while writing data to an xml file
My aim is to write an XML file with few tags whose values are in the regional language. I'm using Python to do this and using IDLE (Pythong GUI) for programming.
While I try to write the words in an xmls file it gives the following error:
UnicodeEncodeError: 'ascii' codec can't encode charact开发者_运维百科ers in position 0-4: ordinal not in range(128)
For now, I'm not using any xml writer library; instead, I'm opening a file "test.xml" and writing the data into it. This error is encountered by the line:
f.write(data)
If I replace the above write statement with print statement then it prints the data properly on the Python shell.
I'm reading the data from an Excel file which is not in the UTF-8, 16, or 32 encoding formats. It's in some other format. cp1252 is reading the data properly.
Any help in getting this data written to an XML file would be highly appreciated.
You should .decode
your incoming cp1252
to get Unicode strings, and .encode
them in utf-8
(by far the preferred encoding for XML) at the time you write, i.e.
f.write(unicodedata.encode('utf-8'))
where unicodedata
is obtained by .decode('cp1252')
on the incoming bytestrings.
It's possible to put lipstick on it by using the codecs
module of the standard Python library to open the input and output files each with their proper encodings in lieu of plain open
, but what I show is the underlying mechanism (and it's often, though not invariably, clearer and more explicit to apply it directly, rather than indirectly via codecs
-- a matter of style and taste).
What does matter is the general principle: translate your input strings to unicode as soon as you can right after you obtain them, use unicode throughout your processing, translate them back to byte strings at late as you can just before you output them. This gives you the simplest, most straightforward life!-)
精彩评论