开发者

Python os.stat and unicode file names

In my Django application, a user has uploaded a file with a unicode character in the name.

When I'm downloading files, I'm calling :

os.path.exists(media)

to test that the file is there. This, in turn, seems to call

st = os.stat(path)

Which then blows up with the e开发者_运维技巧rror :

UnicodeEncodeError: 'ascii' codec can't encode character u'\xcf' in position 92: ordinal not in range(128)

What can I do about this? Is there an option to path.exists to handle it?

Update : Actually, all I had to do was encode the argument to exists, ie.

os.path.exists(media.encode('utf-8')

Thanks everyone who answered.


I'm assuming you're in Unix. If not, please remember to say which OS you're in.

Make sure your locale is set to UTF-8. All modern Linux systems do this by default, usually by setting the environment variable LANG to "en_US.UTF-8", or another language. Also, make sure your filenames are encoded in UTF-8.

With that set, there's no need to mess with encodings to access files in any language, even in Python 2.x.

[~/test] echo $LANG
en_US.UTF-8
[~/test] echo testing > 漢字
[~/test] python2.6
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.stat("漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> os.stat(u"漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> open("漢字").read()
'testing\n'
>>> open(u"漢字").read()
'testing\n'

If this doesn't work, run "locale"; if the values are "C" instead of en_US.UTF-8, you may not have the locale installed correctly.

If you're in Windows, I think Unicode filenames should always just work (at least for the os/posix modules), since the Unicode file API in Windows is supported transparently.


None of these solutions worked for me. However, I did find the (a?) solution. There is yet another place in Apache settings where one has to add the locale setting if one uses WSGI. Official docs are here. Add the following two lines to /etc/apache2/envvars (on Ubuntu):

export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'

Then restart the server. This solved my problem.


It is easy to get this kind of error when running service (E.g: gunicorn) from Upstart.

To fix that, set env in upstart file:

env LANG=en_US.UTF-8
env LC_CTYPE=en_US.UTF-8
env LC_ALL=en_US.UTF-8


Encode to the filesystem encoding before calling. See the locale module.


Change your http server to use UTF-8 locale. For example, I use apache2 on CentOS. I changed /etc/sysconfig/httpd locale setting by HTTPD_LANG:

# CentOS use /etc/sysconfig/httpd to config environment variables.
#
# By default, the httpd process is started in the C locale; to
# change the locale in which the server runs, the HTTPD_LANG
# variable can be set.
#
# HTTPD_LANG=C
HTTPD_LANG=en_US.UTF-8  # you can change to your locale.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜