Are Python's bytes objects also known as strings?
This is a section from Dive Into Python 3 regarding strings:
In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP开发者_JAVA技巧-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
Earlier today I used the hashlib
module and read the help text for md5
that says:
Return a new MD5 hash object; optionally initialized with a string.
Well, it doesn't accept a string
- it accepts a bytes
object.
Maybe I'm reading too much into this, but wouldn't it make more sense if the help text stated a bytes
should be used instead? Or are people using the same name for strings and bytes?
In Python 2 and 3, str
was used both for strings of characters as well as bytes. In Fact, until Python 2.6, there wasn't even a bytes
type (and in 2.6 and 2.7, bytes is str
).
The mentioned inconsistencies in the hashlib documentation are an artifact of this history.
Probably the help is left over from Python2.
This is one of the bigger changes from 2 to 3
Python2 Python3 str bytes unicode str
Python2.6+ starts to prepare for the change by making bytes
a synonym of str
You should report it to the developers (Unless it has already been fixed - I only have 3.1.2 here). I think the wording should probably be improved
精彩评论