开发者

Are Python's bytes objects also known as strings?

This is a section from Dive Into Python 3 regarding strings:

In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP开发者_JAVA技巧-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.

Earlier today I used the hashlib module and read the help text for md5 that says:

Return a new MD5 hash object; optionally initialized with a string.

Well, it doesn't accept a string - it accepts a bytes object.

Maybe I'm reading too much into this, but wouldn't it make more sense if the help text stated a bytes should be used instead? Or are people using the same name for strings and bytes?


In Python 2 and 3, str was used both for strings of characters as well as bytes. In Fact, until Python 2.6, there wasn't even a bytes type (and in 2.6 and 2.7, bytes is str).

The mentioned inconsistencies in the hashlib documentation are an artifact of this history.


Probably the help is left over from Python2.

This is one of the bigger changes from 2 to 3

    Python2          Python3

    str              bytes
    unicode          str

Python2.6+ starts to prepare for the change by making bytes a synonym of str

You should report it to the developers (Unless it has already been fixed - I only have 3.1.2 here). I think the wording should probably be improved

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜