开发者

unicode class in Python

help(unicode) prints something like:

class unicode(basestring)
 |  unicode(string [, encoding[, errors]]) -> object
...

but开发者_开发技巧 you can use something different from a basestring as argument, you can do unicode(1) and get u'1'. What happens in that call? int don't have a __unicode__ method to be called.


If __unicode__ exists it is called, otherwise it falls back to __str__

class A(int):
    def __str__(self):
        print "A.str"
        return int.__str__(self)

    def __unicode__(self):
        print "A.unicode"
        return int.__str__(self)

class B(int):
    def __str__(self):
        print "B.str"
        return int.__str__(self)


unicode(A(1)) # prints "A.unicode"
unicode(B(1)) # prints "B.str"


Same as unicode(str(1)).

>>> class thing(object):
...     def __str__(self):
...         print "__str__ called on " + repr(self)
...         return repr(self)
...
>>> a = thing()
>>> a
<__main__.thing object at 0x7f2f972795d0>
>>> unicode(a)
__str__ called on <__main__.thing object at 0x7f2f972795d0>
u'<__main__.thing object at 0x7f2f972795d0>'

If you really want to see the gritty bits underneath, open up the Python interpreter source code.

Objects/unicodeobject.c#PyUnicode_Type defines the unicode type, with constructor .tp_new=unicode_new.

Since the optional arguments encoding or errors are not given, and a unicode object is being constructed (as opposed to a unicode subclass), Objects/unicodeobject.c#unicode_new calls PyObject_Unicode.

Objects/object.c#PyObject_Unicode calls the __unicode__ method if it exists. If not, it falls back to PY_Type(v)->tp_str (a.k.a. __str__) or PY_Type(v)->tp_repr (a.k.a. __repr__). It then passes the result to PyUnicode_FromEncodedObject.

Objects/unicodeobject.c#PyUnicode_FromEncodedObject finds that it was given a string, and passes it on to PyUnicode_Decode, which returns a unicode object.

Finally, PyObject_Unicode returns to unicode_new, which returns this unicode object.

In short, unicode() will automatically stringify your object if it needs to. This is Python working as expected.


If there is no __unicode__ method, the __str__ method will be called instead. Regardless of which of these methods is called, if a unicode is returned, it will be passed on as-is. If a str is returned, it will be decoded using the default encoding, as returned by sys.getdefaultencoding(), which should almost always be 'ascii'. If some other kind of object is returned, a TypeError will be raised.

(It is possible, by reloading the sys module, to change the default encoding by calling sys.setdefaultencoding(); this is basically always a bad idea.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜