开发者

Cannot convert ascii to utf-8 in python

I have polish word "wąż" which means "snake"

but I get it from webservice in ascii, so :

snake_in_polish_in_ascii="w\xc4\x85\xc5\xbc"

There are results of my trying:

print str(snake_in_polish_in_ascii) #this prints me w─ů┼╝

snake_in_polish_in_ascii.decode('utf-8')
print str(snake_in_polish_i开发者_如何转开发n_ascii) #this prints me w─ů┼╝ too

and this code:

print  str(snake_in_polish_in_ascii.encode('utf-8'))

raises exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 1: ordinal not in range(128)

I'm using Wing Ide, at Windows Xp with polish culture.

At top of file I have:

# -*- coding: utf-8 -*-

I can't find a way to resolve it. Why I can't get "wąż" in output?


This expression:

snake_in_polish_in_ascii.decode('utf-8')

don't change the string in place try like this:

print snake_in_polish_in_ascii.decode('utf-8')

About the reason of why when you do print snake_in_polish_in_ascii you see w─ů┼╝ is because your terminal use the cp852 encoding (Central and Eastern Europe) try like this to see:

>>> print snake_in_polish_in_ascii.decode("cp852")
w─ů┼╝


>>> i="w\xc4\x85\xc5\xbc"
>>> print i.decode('utf-8')
wąż


Example:

snake_in_polish_in_ascii = 'w\xc4\x85\xc5\xbc'
print snake_in_polish_in_ascii.decode('cp1252').encode('utf-8')


by default python source files are treated as encoded in UTF8 inspite of the fact that standard library of python only used ASCII

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜