Python memory management insights -- id()
Playing around with id()
. Began with looking at the addresses of identical attributes in non-identical objects. But that doesn't matter now, I guess. Down to the code:
class T(object):
pass
class N(object):
pass
First test (in interactive console):
n = N()
t = T()
id(n)
# prints 4298619728
id(t)
# prints 4298619792
No surprise here, actually. n.__class__
is different than t.__class__
so it seems obvious they can't possible be the same object. Is the __class__
the only difference between these objects at this time? Assuming no, as:
>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False
Or does Python simply create separate objects even if they are exactly the same, content-wise, instead of assigning the names n1
, n2
to, at first, the same object (in memory) and re-assign when either n1
or n2
is modified? Why so? I understand this may be a question of convention, optimization, mood, low-level issues (don't spare me) but still, I'm curious.
Now, same classes as before, T()
& N()
-- executed one after another in the shell:
>>> id(N())
4298619728
>>> id(N())
4298619792
>>> id(N())
4298619728
>>> id(N())
4298619792
Why the juggling?
But here comes the weird part. Again, sam开发者_C百科e classes, shell:
>>> id(N()), id(T())
(4298619728, 4298619728)
>>> id(N()), id(T())
(4298619728, 4298619728)
>>> id(N()), id(T())
(4298619728, 4298619728)
Not only the juggling stops, but N() and T() appear to be the same object. Since they cannot be, I understand this as whatever N()
returns being destroyed after the id()
call, before the end of the whole statement.
I realize this may be a tough one to answer. But I'm hoping someone could tell me what I'm observing here, whether my understanding is correct, share some dark magic about the inner workings and memory management of the interpreter or perhaps point to some good resources on this subject?
Thanks for your time on this one.
You asked a lot of questions. I'll do my best to answer some of them, and hopefully you'll be able to figure out the rest (ask if you need help).
First question: explain behaviour of id
>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False
This shows that Python creates a new object each time you call an object constructor. This makes sense, because this is exactly what you asked for! If you wanted to allocate only one object, but give it two names, then you could have written this:
>>> n1 = N()
>>> n2 = n1
>>> id(n1) == id(n2)
True
Second question: why not copy-on-write?
You go on to ask why Python doesn't implement a copy-on-write strategy for object allocation. Well, the current strategy, of constructing an object every time you call a constructor, is:
- simple to implement;
- explicit (does exactly what you ask for);
- easy to document and understand.
Also, the use cases for copy-on-write are not compelling. It only saves storage if many identical objects get created and are never modified. But in that case, why create many identical objects? Why not use a single object?
Third question: explain allocation behaviour
In CPython, the id
of an object is (secretly!) its address in memory. See the function builtin_id
in bltinmodule.c
, line 907.
You can investigate Python's memory allocation behaviour by making a class with __init__
and __del__
methods:
class N:
def __init__(self):
print "Creating", id(self)
def __del__(self):
print "Destroying", id(self)
>>> id(N())
Creating 4300023352
Destroying 4300023352
4300023352
You can see that Python was able to destroy the object immediately, which allows it to reclaim the space for re-use by the next allocation. Python uses reference counting to keep track of how many references there are to each object, and when there are no more references to an object, it gets destroyed. Within the execution of the same statement, the same memory may get re-used several times. For example:
>>> id(N()), id(N()), id(N())
Creating 4300023352
Destroying 4300023352
Creating 4300023352
Destroying 4300023352
Creating 4300023352
Destroying 4300023352
(4300023352, 4300023352, 4300023352)
Fourth question: explain the "juggling"
I am afraid I cannot reproduce the "juggling" behaviour you exhibit (where alternately created objects get different addresses). Can you give more details, such as Python version and operating system? What results do you get if you use my class N
?
OK, I can reproduce the juggling if I make my class N
inherit from object
.
I have a theory about why this happens, but I have not checked it in a debugger, so please take it with a pinch of salt.
First, you need to understand a bit about how Python's memory manager works. Go read through obmalloc.c
and come back when you're done. I'll wait.
...
All understood? Good. So now you know that Python manages small objects by sorting them into pools by size: each 4 KiB pool contains objects in a small range of sizes, and there's a free list to help the allocator to quickly find a slot for the next object to be allocated.
Now, the Python interactive shell is also creating objects: the abstract syntax tree and the compiled byte code, for example. My theory is that when N
is a new-style class, it's size is such that it goes into the same pool as some other object that is allocated by the interactive shell. So the sequence of events looks something like this:
User enters
id(N())
Python allocates a slot in pool P for the object just created (call this slot A).
Python destroys the object and returns its slot to the free list for pool P.
The interactive shell allocates some object, call it O. This happens to be the right size to go into pool P, so it gets slot A that was just freed.
User enters
id(N())
again.Python allocates a slot in pool P for the object just created. Slot A is full (still contains object O), so it gets slot B instead.
The interactive shell forgets about object O, so it gets destroyed, and slot A is returned to the free list for pool P.
You can see that this explains the alternating behaviour. In the case where the user types id(N()),id(N())
, the interactive shell doesn't get a chance to stick its oar in between the two allocations, so they can both go in the same slot in the pool.
This also explains why it doesn't happen for old-style objects. Presumably the old-style objects are a different size, so they go in a different pool, and don't share slots with whatever objects the interactive shell is creating.
Fifth question: what objects might the interactive shell be allocating?
See pythonrun.c
for the details, but basically the interactive shell:
Reads your input and allocates strings contains your code.
Calls the parser, which constructs an abstract syntax tree describing the code.
Calls the compiler, which constructs the compiled byte code.
Call the evaluator, which allocates objects for stack frames, locals, globals etc.
I don't know exactly which of these objects is to blame for the "juggling". Not the input strings (strings have their own specialized allocator); not the abstract syntax tree (it gets thrown away after it's been compiled). Maybe it's the byte code object.
The documentation says it all:
id(object)
:Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Whenever you call a constructor, this creates a new object. The object has an id that's different from the id of any other object that's currently alive.
>>> n1 = N()
>>> n2 = N()
>>> id(n1) == id(n2)
False
The "contents" of the two objects do not matter. They are two distinct entities; it seems perfectly logical that they would get different ids.
In CPython, ids are simply memory addresses. They do get recycled: if an object gets garbage collected, another object created at some point in the future might get the same id. This is the behaviour you're seeing in your repeated id(N()), id(T())
tests: since you're not keeping references to the newly created objects, the interpreter is free to garbage collect them and reuse their ids.
The recycling of ids is clearly an implementation/platform artefact and should not be relied upon.
I might be wrong, but I think you are seeing the Garbage Collector in action. A call to N() or T() creates an object that is not stored anywhere and then is picked up by the GC. Afterwards, the memory addresses can be reused.
If you really want the answer, look at the source.
In general terms, unless the language guarantees that value identity and object identity will be the same, or guarantees that interning will occur (as Java does with strings in certain cases), then don't be surprised if value and object identity differ.
I think that the juggling might occur because the python command line interpreter stores the previous executed result as _
. On the command line, I have the juggling but in a python file it doesn't seem to occur. For example, the code:
print id(X())
print id(X())
print id(X())
print id(X())
print id(X())
Produces
>>> print id(X())
3078933196
>>> print id(X())
3078933004
>>> print id(X())
3078932716
>>> print id(X())
3078933196
....
but a python file with the same text produces output
3079140908
3079140908
3079140908
3079140908
3079140908
精彩评论