Python unicode: why in one machine works but in another one it failed sometimes?

2023-01-31 18:53 问答作者：

I found unicode in python really troublesome, why not Python use utf-8 for all the strings? I am in China so I have to use some Chinese string that can't represent by ascii, I use u'' to denote a string, it works well in my ubuntu machine, but in another ubuntu machine (VPS provided by linode.com), it fails some times. The error is:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

The code I am using is:

self.talk(u开发者_运维问答ser.record["fullname"] + u"准备好了")

The thing with the famous UnicodeDecodeError is when you do some string manipulation like the one you did just now:

user.record["fullname"] + u" 准备好了"

because what you're doing is concatenating an str with unicode , so python will do an implicit coercion of the str to an unicode before doing the concatenation this coercion is done like this:

unicode(user.record["fullname"]) + u" 准备好了"
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         Problem

And there is the problem because when doing unicode(something) python will decode the string using the default encoding which is ASCII in python 2.* and if it happen that your string user.record["fullname"] have some no-ASCII character it will raise the famous UnicodeDecodeError error.

so how you can solve it :

# Decode the str to unicode using the right encoding
# here i used utf-8 because mostly is the right one but maybe it not (another problem!!!)
a = user.record["fullname"].decode('utf-8')

self.talk(a + u" 准备好了")

PS: Now in python 3 the default encoding is utf-8 and one other thing you can't do a concatenation of a unicode with the string (byte in python 3.) so no more implicit coercion

You need to decode all non-Unicode strings as early as possible. Try to ensure you have no UTF-8 bytestrings stored anywhere in memory, and you have only unicode objects. For example, make sure that the elements of user.record are all converted to unicode on creation, so you don't get any errors like this one. Or just use Python 3 where it's hard to mix them.

Because for Python 2.x the default encoding is ASCII unless its changed manually. Here is a crude hack to include in your script before any other code

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

This will change default Python encoding to UTF-8.

It took me a long time, but I found it.

look at PRINTENV, specially LANG

LANG=en_CA <- server 2 (not working)

LANG=en_US.UTF-8 <- server 1 (working) "On Linode coincidentally)

Set new Locals

sudo update-locale LANG=en_US.UTF-8 LANGUAGE

Log out, back in, bob's your uncle :)

继续阅读：python python-2.x unicode

Python unicode: why in one machine works but in another one it failed sometimes?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？