unicode class in Python
help(unicode)
prints something like:
class unicode(basestring)
| unicode(string [, encoding[, errors]]) -> object
...
but开发者_开发技巧 you can use something different from a basestring as argument, you can do unicode(1) and get u'1'. What happens in that call? int don't have a __unicode__ method to be called.
If __unicode__
exists it is called, otherwise it falls back to __str__
class A(int):
def __str__(self):
print "A.str"
return int.__str__(self)
def __unicode__(self):
print "A.unicode"
return int.__str__(self)
class B(int):
def __str__(self):
print "B.str"
return int.__str__(self)
unicode(A(1)) # prints "A.unicode"
unicode(B(1)) # prints "B.str"
Same as unicode(str(1))
.
>>> class thing(object): ... def __str__(self): ... print "__str__ called on " + repr(self) ... return repr(self) ... >>> a = thing() >>> a <__main__.thing object at 0x7f2f972795d0> >>> unicode(a) __str__ called on <__main__.thing object at 0x7f2f972795d0> u'<__main__.thing object at 0x7f2f972795d0>'
If you really want to see the gritty bits underneath, open up the Python interpreter source code.
Objects/unicodeobject.c#PyUnicode_Type
defines the unicode
type, with constructor .tp_new=unicode_new
.
Since the optional arguments encoding
or errors
are not given, and a unicode
object is being constructed (as opposed to a unicode
subclass), Objects/unicodeobject.c#unicode_new
calls PyObject_Unicode
.
Objects/object.c#PyObject_Unicode
calls the __unicode__
method if it exists. If not, it falls back to PY_Type(v)->tp_str
(a.k.a. __str__
) or PY_Type(v)->tp_repr
(a.k.a. __repr__
). It then passes the result to PyUnicode_FromEncodedObject
.
Objects/unicodeobject.c#PyUnicode_FromEncodedObject
finds that it was given a string, and passes it on to PyUnicode_Decode
, which returns a unicode
object.
Finally, PyObject_Unicode
returns to unicode_new
, which returns this unicode
object.
In short, unicode()
will automatically stringify your object if it needs to. This is Python working as expected.
If there is no __unicode__
method, the __str__
method will be called instead. Regardless of which of these methods is called, if a unicode
is returned, it will be passed on as-is. If a str
is returned, it will be decoded using the default encoding, as returned by sys.getdefaultencoding()
, which should almost always be 'ascii'
. If some other kind of object is returned, a TypeError
will be raised.
(It is possible, by reloading the sys module, to change the default encoding by calling sys.setdefaultencoding()
; this is basically always a bad idea.)
精彩评论