Python 2.6 GC appears to cleanup objects, but memory is not released
I have a program written in python 2.6 that creates a large number of short lived instances (it is a classic p开发者_高级运维roducer-consumer problem). I noticed that the memory usage as reported by top and pmap seems to increase when these instances are created and never goes back down. I was concerned that some python module I was using might be leaking memory so I carefully isolated the problem in my code. I then proceeded to reproduce it in as short as example as possible. I came up with this:
class LeaksMemory(list):
timesDelCalled = 0
def __del__(self):
LeaksMemory.timesDelCalled +=1
def leakSomeMemory():
l = []
for i in range(0,500000):
ml = LeaksMemory()
ml.append(float(i))
ml.append(float(i*2))
ml.append(float(i*3))
l.append(ml)
import gc
import os
leakSomeMemory()
print("__del__ was called " + str(LeaksMemory.timesDelCalled) + " times")
print(str(gc.collect()) +" objects collected")
print("__del__ was called " + str(LeaksMemory.timesDelCalled) + " times")
print(str(os.getpid()) + " : check memory usage with pmap or top")
If you run this with something like 'python2.6 -i memoryleak.py' it will halt and you can use pmap -x PID to check the memory usage. I added the del method so I could verify that GC was occuring. It is not there in my actual program and does not appear to make any functional difference. Each call to leakSomeMemory() increases the amount of memory consumed by this program. I fear I am making some simple error and that references are getting kept by accident, but cannot identify it.
Python will release the objects, but it will not release the memory back to the operating system immediately. Instead, it will re-use the same segments for future allocations within the same interpreter.
Here's a blog post about the issue: http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
UPDATE: I tested this myself with Python 2.6.4 and didn't notice persistent increases in memory usage. Some invocations of leakSomeMemory()
caused the memory footprint of the Python process to increase, and some made it decrease again. So it all depends on how the allocator is re-using the memory.
According to Alex Martelli:
"The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates."
So, in your situation it sounds like it would make sense to use the multiprocessing module to run the short-lived functions in separate processes to ensure the return of resources when the process finishes.
import multiprocessing as mp
def NOT_leakSomeMemory():
# do stuff
return result
if __name__=='__main__':
pool = mp.Pool()
results=pool.map(NOT_leakSomeMemory, range(500000))
For more ideas on how to set things up using multiprocessing, see Doug Hellman's tutorial:
精彩评论