Python String memory usage on FreeBSD
I'm observing a 开发者_JAVA技巧strange memory usage pattern with python strings on Freebsd. Consider the following session. Idea is to create a list which holds some strings so that cumulative characters in the list is 100MB.
l = []
for i in xrange(100000):
l.append(str(i) * (1000/len(str(i))))
This uses around 100MB of memory as expected and 'del l' will clear that.
l = []
for i in xrange(20000):
l.append(str(i) * (5000/len(str(i))))
This is using 165MB of memory. I really don't understand where the additional memory usage is coming from. [Size of both lists are same]
Python 2.6.4 on FreeBSD 7.2. On Linux/windows both uses around 100mb memory only.
Update: I'm measuring memory using 'ps aux'. That can be executed using os.sytem after above code snippets. Also These were executed separately.
Update2: Looks like freebsd mallocs memory in multiples of 2. So allocating 5KB actually allocates 8KB. I'm not sure though.
In my opinion, that would probably be fragments in memory. First of all, memory chunks which are bigger than 255 bytes will be allocated with malloc in CPython. You can reference to
Improving Python's Memory Allocator
For performance reason, most of memory allocation, like malloc, will return a aligned address. For example, you will never get a address like
0x00003
It is not aligned by 4 bytes, it would be very slow for computer to access the memory. Therefore, all address you get by malloc should be
0x00000
0x00004
0x00008
and so on. The 4 bytes alignment is only the basic common rule, real policy of alignment would be OS variant.
And the memory usage you are talking about should be RSS (not sure). For most of OS, page size of virtual memory is 4K. For what you allocate, you need 2 page for storing a 5000 byte chunk. Let's see an example for illustrating some memory leak. We assume the alignment is by 256 bytes here.
0x00000 {
... chunk 1
0x01388 }
0x01389 {
... fragment 1
0x013FF }
0x01400 {
... chunk 2
0x02788 }
0x02789 {
... fragment 2
0x027FF }
0x02800 {
... chunk 3
0x03B88 }
0x03B89 {
... fragment 3
0x04000 }
As you can see there are so many fragments in the memory, they can't be used, but still, they occupy the memory space of a page. I'm not sure what is the alignment policy of FreeBSD, but I think it is caused by reason like this. For using memory efficiently with Python, you can use a big chunk of pre-allocated bytearray, and pick a good number as the chunk to use (You have to test to know which number is best, it depends on OS).
The answer may be in this saga. I think that you're witnessing some unavoidable memory manager overhead.
As @Hossein says, try executing both code snippets in one run, and then swap them.
I think that all the memory addresses in freebsd have to be aligned to power of two. So all python's memory pools are somewhat fragmented into the memory and not continuous.
Try to use some other tool to spot anything interesting
精彩评论