Cannot convert ascii to utf-8 in python
I have polish word "wąż" which means "snake"
but I get it from webservice in ascii, so :
snake_in_polish_in_ascii="w\xc4\x85\xc5\xbc"
There are results of my trying:
print str(snake_in_polish_in_ascii) #this prints me w─ů┼╝
snake_in_polish_in_ascii.decode('utf-8')
print str(snake_in_polish_i开发者_如何转开发n_ascii) #this prints me w─ů┼╝ too
and this code:
print str(snake_in_polish_in_ascii.encode('utf-8'))
raises exception:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 1: ordinal not in range(128)
I'm using Wing Ide, at Windows Xp with polish culture.
At top of file I have:
# -*- coding: utf-8 -*-
I can't find a way to resolve it. Why I can't get "wąż" in output?
This expression:
snake_in_polish_in_ascii.decode('utf-8')
don't change the string in place try like this:
print snake_in_polish_in_ascii.decode('utf-8')
About the reason of why when you do print snake_in_polish_in_ascii
you see w─ů┼╝
is because your terminal use the cp852 encoding (Central and Eastern Europe) try like this to see:
>>> print snake_in_polish_in_ascii.decode("cp852")
w─ů┼╝
>>> i="w\xc4\x85\xc5\xbc"
>>> print i.decode('utf-8')
wąż
Example:
snake_in_polish_in_ascii = 'w\xc4\x85\xc5\xbc'
print snake_in_polish_in_ascii.decode('cp1252').encode('utf-8')
by default python source files are treated as encoded in UTF8 inspite of the fact that standard library of python only used ASCII
精彩评论