python encoding error
What does one do with this kind of error? You are reading lines from a file. You don't know the encoding.
What does "byte 0xed" mean? What does "position 3792" mean?
I'll try to answer this myself and repost but I'm slightly annoyed that I'm spending as long as I am figuring this out. Is there a clobber/ignore and continue method for getting past unknown encodings? I just want to read a text file!
Traceback (most recent call last):
File "./test.py", line 8, in <module>
for x in fin:
File "/bns/开发者_JS百科rma/local/lib/python3.1/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 3792: ordinal not in range(128)
0xed
is the unicode code for í
, which is contained in the input at the position 3792 (that is, if you count starting at the first letter, the 3792nd letter will be í
).
You are using the ascii codec to decode the file, but the file is not ascii-encoded, try with a unicode aware codec instead (utf_8
maybe?), or, if you know the encoding used to write the file, choose the appropriate encoding from the full list of available codecs.
I think I found the way to be dumb :) :
fin = (x.decode('ascii', 'ignore') for x in fin)
for x in fin: print(x)
where errors='ignore' could be 'replace' or whatever. This at least follows the idiom "garbage in, garbage out" that I am seeking.
精彩评论