encoding in python: what type is the variable
Python file
# -*- coding: UTF-8 -*-
a = 'Köppler'
print a
print a.__class__.__name__
mydict = {}
mydict['name'] = a
print mydict
print mydict['name']
Output:
Köppler
s开发者_StackOverflowtr
{'name': 'K\xc3\xb6ppler'}
Köppler
It seems that the name remains the same, but only when printing a dictionary I get this strange escaped character string. What am I looking at then? Is that the UTF-8 representation?
The reason for that behavior is that the __repr__
function in Python 2 escapes non-ASCII unicode characters. As the link shows, this is fixed in Python 3.
Yes, that's the UTF-8 representation of ö
(U+00F6 LATIN SMALL LETTER O WITH DIAERESIS). It consists of a 0xC3 octet followed by a 0xB6 octet. UTF-8 is a very elegant encoding, I think, and worth reading up on. The history of its design (on a placemat in a diner) is described here by Rob Pike.
As far as I'm concerned there are two methods in Python for displaying objects: str() and repr(). Str() is used internally inside print, however Apparently dict's str() uses repr() for keys and values.
As it has been mentioned: repr() escapes unicode characters.
It seems you are using python 2.x, where you have to specify that the object is actually a unicode string and not a plain ascii. You specified that the code is utf-8, thus you actually typed 2 bytes for your ö, and as it is a regular string, you got the 2 escaped chars.
Try to specify the unicode a= u'Köppler'
. You may need to encode it before printing, depending on your consol encoding: print a.encode('utf-8')
精彩评论