开发者

encoding in python: what type is the variable

Python file

# -*- coding: UTF-8 -*-
a = 'Köppler'
print a
print a.__class__.__name__
mydict = {}
mydict['name'] = a
print mydict
print mydict['name']

Output:

Köppler
s开发者_StackOverflowtr
{'name': 'K\xc3\xb6ppler'}
Köppler

It seems that the name remains the same, but only when printing a dictionary I get this strange escaped character string. What am I looking at then? Is that the UTF-8 representation?


The reason for that behavior is that the __repr__ function in Python 2 escapes non-ASCII unicode characters. As the link shows, this is fixed in Python 3.


Yes, that's the UTF-8 representation of ö (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS). It consists of a 0xC3 octet followed by a 0xB6 octet. UTF-8 is a very elegant encoding, I think, and worth reading up on. The history of its design (on a placemat in a diner) is described here by Rob Pike.


As far as I'm concerned there are two methods in Python for displaying objects: str() and repr(). Str() is used internally inside print, however Apparently dict's str() uses repr() for keys and values.

As it has been mentioned: repr() escapes unicode characters.


It seems you are using python 2.x, where you have to specify that the object is actually a unicode string and not a plain ascii. You specified that the code is utf-8, thus you actually typed 2 bytes for your ö, and as it is a regular string, you got the 2 escaped chars. Try to specify the unicode a= u'Köppler'. You may need to encode it before printing, depending on your consol encoding: print a.encode('utf-8')

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜