bytes vs bytearray in Python 2.6 and 3
I'm experimenting with bytes
vs bytearray
in Python 2.6. I don't understand the reason for some differences.
A bytes
iterator returns strings:
for i in bytes(b"hi"):
print(type(i))
Gi开发者_StackOverflow中文版ves:
<type 'str'>
<type 'str'>
But a bytearray
iterator returns int
s:
for i in bytearray(b"hi"):
print(type(i))
Gives:
<type 'int'>
<type 'int'>
Why the difference?
I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?
For (at least) Python 3.7
According to the docs:
bytes
objects are immutable sequences of single bytes
bytearray
objects are a mutable counterpart to bytes objects.
And that's pretty much it as far as bytes
vs bytearray
. In fact, they're fairly interchangeable and designed to flexible enough to be mixed in operations without throwing errors. In fact, there is a whole section in the official documentation dedicated to showing the similarities between the bytes
and bytearray
apis.
Some clues as to why from the docs:
Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.
In Python 2.6 bytes is merely an alias for str.
This "pseudo type" was introduced to [partially] prepare programs [and programmers!] to be converted/compatible with Python 3.0 where there is a strict distinction of semantics and use for str (which are systematically unicode) and bytes (which are arrays of octets, for storing data, but not text)
Similarly the b prefix for string literals is ineffective in 2.6, but it is a useful marker in the program, which flags explicitly the intent of the programmer to have the string as a data string rather than a text string. This info can then be used by the 2to3 converter or similar utilities when the program is ported to Py3k.
You may want to check this SO Question for additional info.
TL;DR
python2.6+
bytes
= python2.6+str
= python3.xbytes
!= python3.xstr
python2.6+
bytearray
= python3.xbytearray
python2.x
unicode
= python3.xstr
Long Answer
bytes
and str
have changed meaning in python since python 3.x.
First to answer your question shortly, in python 2.6 bytes(b"hi")
is an immutable array of bytes (8-bits or octets). So the type of each byte
is simply byte
, which is the same as str
in python 2.6+ (However, this is not the case in python 3.x)
bytearray(b"hi")
is again a mutable array of bytes. But when you ask its type, it's an int
, because python represents each element of bytearray
as an integer in range 0-255 (all possible values for an 8-bit integer). However, an element of bytes
array is represented as an ASCII value of that byte.
For example, consider in Python 2.6+
>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0] # python shows you an int value for the 8 bits 0110 1000
104
>>> bs[0] # python shows you an ASCII value for the 8 bits 0110 1000
'h'
>>> chr(barr[0]) # chr converts 104 to its corresponding ASCII value
'h'
>>> bs[0]==chr(barr[0]) # python compares ASCII value of 1st byte of bs and ASCII value of integer represented by first byte of barr
True
Now python 3.x is an entirely different story. As you might have suspected, it is weird why an str
literal would mean a byte
in python2.6+. Well this answer explains that
In Python 3.x, an str
is a Unicode text (which was previously just an array of bytes, note that Unicode and bytes are two completely different things). bytearray
is a mutable array of bytes while bytes
is an immutable array of bytes. They both have almost the same functions. Now if I run the above same code again in python 3.x, here is the result. In Python 3.x
>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0]
104
>>> bs[0]
104
>>> bs[0]==barr[0] # bytes and bytearray are same thing in python 3.x
True
bytes
and bytearray
are the same things in python 3.x, except for there mutability.
What happened to str
you might ask? str
in python 3 got converted to what unicode
was in python 2, and unicode
type was subsequently removed from python 3 as it was redundant.
I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?
It depends on what you are trying to do. Are you dealing with bytes or are you dealing with ASCII representation of bytes?
If you are dealing with bytes, then my advice is to use bytearray
in Python 2, which is the same in python 3. But you loose immutability, if that matter to you.
If you are dealing with ASCII or text, then represent your string as u'hi'
in Python 2, which has the same meaning in python 3. 'u'
has special meaning in Python 2, which instructs python 2 to treat a string literal as unicode
type. 'u' in python 3 as no meaning, because all string literal in Python 3 are Unicode by default (which is confusingly called str
type in python 3, and unicode
type in python 2).
I am not sure since which version, but bytes
is actually a str
, which you can see if you do type(bytes(b"hi"))
-> <type 'str'>
.
bytearray
is a mutable array of bytes, one constructor of which takes a string.
I tried it on Python 3.0.
In Python 3.0, a bytes
iterator returns int
s, not strings as Python 2.6 did:
for i in bytes(b"hi"):
print(type(i))
Gives:
<class 'int'>
<class 'int'>
A bytearray
iterator also returns class 'int'
s.
精彩评论