Strange result in python
Could someone explain me this strange result on python 2.6.6 ?
>>> a = "xx"
>>> b = "xx"
>>> a.__hash__() == b.__hash__()
True
>>> a is b
True # ok.. was just to be sure
>>> a = "x" * 2
>>> b = "x" * 2
>>> a.__hash__() == b.__hash__()
True
>>&g开发者_开发问答t; a is b
True # yeah.. looks ok so far !
>>> n = 2
>>> a = "x" * n
>>> b = "x" * n
>>> a.__hash__() == b.__hash__()
True # still okay..
>>> a is b
False # hey! What the F... ?
The is
operator tells you whether two variables point to the same object in memory. It is rarely useful and often confused with the ==
operator, which tells you whether two objects "look the same".
It is particularly confusing when used with things like short string literals, because the Python compiler interns these for efficiency. In other words, when you write "xx"
the compiler (emits bytecode that) creates one string object in memory and causes all literals "xx"
to point to it. This explains why your first two comparisons are True. Notice that you can get the id of the strings by calling id
on them, which (at least on CPython is probably) their address in memory:
>>> a = "xx"
>>> b = "xx"
>>> id(a)
38646080
>>> id(b)
38646080
>>> a is b
True
>>> a = "x"*10000
>>> b = "x"*10000
>>> id(a)
38938560
>>> id(b)
38993504
>>> a is b
False
The third is because the compiler hasn't interned the strings a
and b
, for whatever reason (probably because it isn't smart enough to notice that the variable n
is defined once and then never modified).
You can in fact force Python to intern strings by, well, asking it to. This will give you a piddling amount of performance increase and might help. It's probably useless.
Moral: don't use is
with string literals. Or int literals. Or anywhere you don't mean it, really.
To understand this, you need to understand a few different things.
a is b
returns true ifa
andb
are the same object, not merely if they have the same value. Strings can have the same value but be a different instance of that value.- When you say
a = "x"
, what you're actually doing is creating a string constant"x"
and then assigning a name to it,a
. String constants are strings which are written literally in the code, and not calculated programmatically. String constants are always interned, which means they're stored in a table for reuse: if you saya = "a"; b = "a"
, it's actually the same as sayinga = "a"; b = a
, as they'll use the same interned string"a"
. That's why the firsta is b
is True. - When you say
a = "x" * 2
, the Python compiler is actually optimizing this. It calculates the string at compile-time--it generates code as if you had writtena = "xx"
. Thus, the resulting string"xx'
is interned. That's why the seconda is b
is true. - When you say
a = "x" * n
, the Python compiler doesn't know what n is at compile time. Therefore, it's forced to actually output the string"x"
and then perform the string multiplication at runtime. Since that's performed at runtime, while"x"
is interned the resulting string"xx"
is not. As a result, each of these strings are different instances of"xx"
, so the finala is b
is False.
You can see the difference yourself:
def a1():
a = "x"
def a2():
a = "x" * 2
def a3():
n = 2
a = "x" * n
import dis
print "a1:"
dis.dis(a1)
print "a2:"
dis.dis(a2)
print "a3:"
dis.dis(a3)
In CPython 2.6.4, this outputs:
a1:
4 0 LOAD_CONST 1 ('x')
3 STORE_FAST 0 (a)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
a2:
6 0 LOAD_CONST 3 ('xx')
3 STORE_FAST 0 (a)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
a3:
8 0 LOAD_CONST 1 (2)
3 STORE_FAST 0 (n)
9 6 LOAD_CONST 2 ('x')
9 LOAD_FAST 0 (n)
12 BINARY_MULTIPLY
13 STORE_FAST 1 (a)
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
Finally, note that you can say a = intern(a); b = intern(b)
to create interned versions if the strings, which will guarantee that a is b
is true. If all you want is to check string equality, however, just use a == b
.
精彩评论