unicode class in Python

2022-12-19 03:11 问答作者：

help(unicode) prints something like:

class unicode(basestring)
 |  unicode(string [, encoding[, errors]]) -> object
...

but开发者_开发技巧 you can use something different from a basestring as argument, you can do unicode(1) and get u'1'. What happens in that call? int don't have a __unicode__ method to be called.

If __unicode__ exists it is called, otherwise it falls back to __str__

class A(int):
    def __str__(self):
        print "A.str"
        return int.__str__(self)

    def __unicode__(self):
        print "A.unicode"
        return int.__str__(self)

class B(int):
    def __str__(self):
        print "B.str"
        return int.__str__(self)


unicode(A(1)) # prints "A.unicode"
unicode(B(1)) # prints "B.str"

Same as unicode(str(1)).

>>> class thing(object):
...     def __str__(self):
...         print "__str__ called on " + repr(self)
...         return repr(self)
...
>>> a = thing()
>>> a
<__main__.thing object at 0x7f2f972795d0>
>>> unicode(a)
__str__ called on <__main__.thing object at 0x7f2f972795d0>
u'<__main__.thing object at 0x7f2f972795d0>'

If you really want to see the gritty bits underneath, open up the Python interpreter source code.

Objects/unicodeobject.c#PyUnicode_Type defines the unicode type, with constructor .tp_new=unicode_new.

Since the optional arguments encoding or errors are not given, and a unicode object is being constructed (as opposed to a unicode subclass), Objects/unicodeobject.c#unicode_new calls PyObject_Unicode.

Objects/object.c#PyObject_Unicode calls the __unicode__ method if it exists. If not, it falls back to PY_Type(v)->tp_str (a.k.a. __str__) or PY_Type(v)->tp_repr (a.k.a. __repr__). It then passes the result to PyUnicode_FromEncodedObject.

Objects/unicodeobject.c#PyUnicode_FromEncodedObject finds that it was given a string, and passes it on to PyUnicode_Decode, which returns a unicode object.

Finally, PyObject_Unicode returns to unicode_new, which returns this unicode object.

In short, unicode() will automatically stringify your object if it needs to. This is Python working as expected.

If there is no __unicode__ method, the __str__ method will be called instead. Regardless of which of these methods is called, if a unicode is returned, it will be passed on as-is. If a str is returned, it will be decoded using the default encoding, as returned by sys.getdefaultencoding(), which should almost always be 'ascii'. If some other kind of object is returned, a TypeError will be raised.

(It is possible, by reloading the sys module, to change the default encoding by calling sys.setdefaultencoding(); this is basically always a bad idea.)

继续阅读：python unicode

unicode class in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？