UnicodeDecodeError in Python with codecs module
I have a text file which comprises unicode strings "aBiyukÙwa", "varcasÙva" etc. When I try to开发者_开发问答 decode them in the python interpreter using the following code, it works fine and decodes to u'aBiyuk\xd9wa'
:
"aBiyukÙwa".decode("utf-8")
But when I read it from a file in a python program using the codecs
module in the following code it throws a UnicodeDecodeError
.
file = codecs.open('/home/abehl/TokenOutput.wx', 'r', 'utf-8')
for row in file:
Following is the error message:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd9 in position 8: invalid continuation byte
Any ideas what is causing this strange behavior?
Your file is not encoded in UTF-8. Find out what it is encoded in, and then use that.
精彩评论