Standard Python libraries and Unicode
I have been reading left right and centre about unicode and python. I think I understand what encoding/decoding is, yet as soon as I try to use a standard library method manipulati开发者_JAVA技巧ng a file name, I get the infamous:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19:
ordinal not in range(128)
In this case \xe9 stands for 'é', and it doesn't matter if I call it from a os.path.join() or a shutil.copy(), it throws the same error. From what I understand it has to do with the default encoding of python. I try to change it with:
# -*- coding: utf-8 -*-
Nothing changes. If I type:
sys.setdefaultencoding('utf-8')
it tells me:
ImportError: cannot import name setdefaultencoding
What I really don't understand is why it works when I type it in the terminal, '\xe9' and all. Could someone please explain to me why this is happening/how to get around it?
Thank you
Filenames on *nix cannot be manipulated as unicode
. The filename must be encoded to match the charset of the filesystem and then used.
you should decode manually the filename with the correct encoding (latin1?) before os.path.join
btw: # -- coding: utf-8 -- refers to the string literals in your .py file
effbot has some good infos
You should not touch the default encoding. It is best practice and highly recommendable to keep it with 'ascii' and convert your data properly to utf-8 on the output side.
精彩评论