开发者

python encoding error

What does one do with this kind of error? You are reading lines from a file. You don't know the encoding.

What does "byte 0xed" mean? What does "position 3792" mean?

I'll try to answer this myself and repost but I'm slightly annoyed that I'm spending as long as I am figuring this out. Is there a clobber/ignore and continue method for getting past unknown encodings? I just want to read a text file!

Traceback (most recent call last):
  File "./test.py", line 8, in <module>
    for x in fin:
  File "/bns/开发者_JS百科rma/local/lib/python3.1/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 3792: ordinal not in range(128)


0xed is the unicode code for í, which is contained in the input at the position 3792 (that is, if you count starting at the first letter, the 3792nd letter will be í).

You are using the ascii codec to decode the file, but the file is not ascii-encoded, try with a unicode aware codec instead (utf_8 maybe?), or, if you know the encoding used to write the file, choose the appropriate encoding from the full list of available codecs.


I think I found the way to be dumb :) :

fin = (x.decode('ascii', 'ignore') for x in fin)

for x in fin: print(x)

where errors='ignore' could be 'replace' or whatever. This at least follows the idiom "garbage in, garbage out" that I am seeking.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜