Python's layout of low-value ints in memory
My question is: where do these patterns (below) originate?
I learned (somewhere) that Python has unique "copies", if that's the right word, for small integers. For example:
>>> x = y = 0
>>> id(0)
4297074752
>>> id(x)
4297074752
>>> id(y)
4297074752
>>> x += 1
>>> id(x)
4297074728
>>> y
0
When I look at the memory locations of ints, there is a simple pattern early:
>>> N = id(0)
>>> for i in range(5):
... print i, N - id(i)
...
0 0
1 24
2 48
3 72
4 96
>>> bin(24)
'0b11000'
It's not clear to me why this is chosen as the increment. Moreover, I can't explain this pattern at all above 256:
>>> prev = 0
>>> for i in range(270):
... t = (id(i-1), id(i))
... diff = t[0] - t[1]
... if diff != prev:
... print i-1, i, t, diff
... prev = diff
...
-1 0 (4297074776, 4297074752) 24
35 36 (4297073912, 4297075864) -1952
36 37 (4297075864, 4297075840) 24
76 77 (4297074904, 4297076856) -1952
77 78 (4297076856, 4297076832) 24
117 118 (4297075896, 4297077848) -1952
118 119 (4297077848, 4297077824) 24
158 159 (4297076888, 4297078840) -1952
159 160 (4297078840, 4297078816) 24
199 200 (4297077880, 4297079832) -1952
200 201 (4297079832, 4297079808) 24
240 241 (4297078872, 4297080824) -1952
241 242 (4297080824, 4297080800) 24
256 257 (4297080464, 4297155264) -74800
257 258 (4297155072, 4297155288) -216
259 260 (4297155072, 4297155336) -264
260 261 (42971开发者_如何学JAVA55048, 4297155432) -384
261 262 (4297155024, 4297155456) -432
262 263 (4297380280, 4297155384) 224896
263 264 (4297155000, 4297155240) -240
264 265 (4297155072, 4297155216) -144
266 267 (4297155072, 4297155168) -96
267 268 (4297155024, 4297155144) -120
Any thoughts, clues, places to look?
Edit: and what's special about 24?
Update: The standard library has sys.getsizeof()
which returns 24
when I call it with 1
as argument. That's a lot of bytes, but on a 64-bit machine, we have 8 bytes each for the type, the value and the ref count. Also, see here, and the C API reference here.
Spent some time with the "source" in the link from Peter Hansen in comments. Couldn't find the definition of an int (beyond a declaration of *int_int
), but I did find:
#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
Low-value integers are preallocated, high value integers are allocated whenever they are computed. Integers that appear in source code are the same object. On my system,
>>> id(2) == id(1+1)
True
>>> id(1000) == id(1000+0)
False
>>> id(1000) == id(1000)
True
You'll also notice that the ids depend on the system. They're just memory addresses, assigned by the system allocator (or possibly the linker, for static objects?)
>>> id(0)
8402324
Edit: The reason id(1000) == id(1000)
is because the Python compiler notices that two of the integer constants in the code it's compiling are the same, so it only allocates one object for both. This would be an unacceptable performance hit at runtime, but at compile time it's not noticeable. (Yes, the interpreter is also a compiler. Most interpreters are also compilers, very few aren't.)
I seem to recall that Python internally caches copies of ints < 256 to save having to create new Python objects for commonly-used cases. So once you go over 256 you're getting newly created objects each time which may appear to be 'randomly' laid out in memory (obviously their locations would make sense to Python's allocator, but probably not to us).
The first 256 ints are preallocated
>>> for n in range(1000):
>>> if id(n) != id(n+0):
>>> print n
>>> break
257
Since others have fully answered why ids have a distinct pattern up to 256, I thought I would answer your addendum instead: 24 is the size in bytes of an integer object in python. When the first 256 integers are allocated they are done so contiguously in memory, leading to a difference of 24 between each memory address.
integers up to 256 are interned, that's why you see a "pattern" there. All other integers are created during execution and as such allocated random id
.
In CPython, IDs are the address of the objects.
精彩评论