开发者

Standard Python libraries and Unicode

I have been reading left right and centre about unicode and python. I think I understand what encoding/decoding is, yet as soon as I try to use a standard library method manipulati开发者_JAVA技巧ng a file name, I get the infamous:

 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19:
 ordinal not in range(128)

In this case \xe9 stands for 'é', and it doesn't matter if I call it from a os.path.join() or a shutil.copy(), it throws the same error. From what I understand it has to do with the default encoding of python. I try to change it with:

# -*- coding: utf-8 -*- 

Nothing changes. If I type:

sys.setdefaultencoding('utf-8')

it tells me:

ImportError: cannot import name setdefaultencoding

What I really don't understand is why it works when I type it in the terminal, '\xe9' and all. Could someone please explain to me why this is happening/how to get around it?

Thank you


Filenames on *nix cannot be manipulated as unicode. The filename must be encoded to match the charset of the filesystem and then used.


you should decode manually the filename with the correct encoding (latin1?) before os.path.join

btw: # -- coding: utf-8 -- refers to the string literals in your .py file

effbot has some good infos


You should not touch the default encoding. It is best practice and highly recommendable to keep it with 'ascii' and convert your data properly to utf-8 on the output side.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜