What is the proper way to use str.decode and unicode.encode?
What is the proper way to use str.decode
and unicode.encode
?
Eg.
print str.deco开发者_Python百科de
print unicode.encode
Ignacio's example is correct but depends on your console being able to display Unicode characters, which on Windows it usually can't. Here's the same thing with only safe string escapes (reprs):
>>> '\xe3\x81\x82'.decode('utf-8') # three top-bit-set bytes, representing one character
u'\u3042' # Hiragana letter A
>>> u'\u3042'.encode('shift-jis')
'\x82\xa0' # only requires two bytes in the Shift-JIS encoding
>>> unicode('\x82\xa0', 'shift-jis') # alternative way of doing a decode
u'\u3042'
when you're writing to eg. a file or via a web server, or you're on another operating system where the console supports UTF-8, it's a bit easier.
print 'あ'.decode('utf-8')
print repr(u'あ'.encode('shift-jis'))
>>> unicode.encode(u"abcd","utf8")
'abcd' #unicode string u"abcd" got encoded to UTF-8 encoded string "abcd"
>>> str.decode("abcd","utf8")
u'abcd' #UTF-8 string "abcd" got decoded to python's unicode object u"abcd"
>>>
精彩评论