Stack assignment to a thread
I've been trying to piece together how stack memory is handed out to threads. I haven't been able to piece the whole thing together. I tried to go to the code, but I'm more confused, so I'm asking for your help.
I asked this question a little while ago. So assume that particular program (therefore, all threads are within the same process). If I write printf
s for each beginning of stack pointer, and then how much is allocated for them, then I get stuff like the table at the end of this message, where the first column is a time_t usec
, the second doesn't matter, the third is the tid of the thread, the fourth is the guard size, then begin of stack, end of stack (sorted by beginning of stack), last but one is the allocated stack (8 Megs by default) and the last column is the difference between the end of the first allocated stack, and the beginning of the next stack.
This means that (I think), if 0, then the stacks are contiguous, if positive, since the stack grows down in 开发者_如何学Cmemory, then it means that there is "free space" of however many Mbs between a tid and the next (in memory). If negative, this means that memory is being reused. So this may mean that that stack space has been freed before this thread was created.
My problem is: what exactly is the algorithm that assigns stack space to threads (at a higher level than code) and why do I sometimes get contiguous stacks, and sometimes not, and sometimes get values like 7.94140625 and 0.0625 in the last column?
This is all Linux 2.6, C and pthreads.
This may be a question we will have to iterate on to get it right, and for this I apologize, but I'm telling you what I know right now. Feel free to ask for clarifications.
Thanks for this. The table follows.
52815 14 14786 4096 92549120 100941824 8392704 0
52481 14 14784 4096 100941824 109334528 8392704 0
51700 14 14777 4096 109334528 117727232 8392704 0
70747 14 14806 4096 117727232 126119936 8392704 8.00390625
75813 14 14824 4096 117727232 126119936 8392704 0
51464 14 14776 4096 126119936 134512640 8392704 8.00390625
76679 14 14833 4096 126119936 134512640 8392704 -4.51953125
53799 14 14791 4096 139251712 147644416 8392704 -4.90234375
52708 14 14785 4096 152784896 161177600 8392704 0
50912 14 14773 4096 161177600 169570304 8392704 0
51617 14 14775 4096 169570304 177963008 8392704 0
70028 14 14793 4096 177963008 186355712 8392704 0
51048 14 14774 4096 186355712 194748416 8392704 0
50596 14 14771 4096 194748416 203141120 8392704 8.00390625
First, by stracing a simple test program that launches a single thread, we can see the syscalls it used to create a new thread. Here's a simple test program:
#include <pthread.h>
#include <stdio.h>
void *test(void *x) { }
int main() {
pthread_t thr;
printf("start\n");
pthread_create(&thr, NULL, test, NULL);
pthread_join(thr, NULL);
printf("end\n");
return 0;
}
And the relevant portion of its strace output:
write(1, "start\n", 6start
) = 6
mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xf6e32000
brk(0) = 0x8915000
brk(0x8936000) = 0x8936000
mprotect(0xf6e32000, 4096, PROT_NONE) = 0
clone(child_stack=0xf7632494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xf7632bd8, {entry_number:12, base_addr:0xf7632b70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xf7632bd8) = 9181
futex(0xf7632bd8, FUTEX_WAIT, 9181, NULL) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "end\n", 4end
) = 4
exit_group(0) = ?
We can see that it obtains a stack from mmap with PROT_READ|PROT_WRITE
protection and MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK
flags. It then protects the first (ie, lowest) page of the stack, to detect stack overflows. The rest of the calls aren't really relevant to the discussion at hand.
So, then, how does mmap
allocate the stack, then? Well, let's start at mmap_pgoff
in the Linux kernel; the entry point for the modern mmap2
syscall. It delegates to do_mmap_pgoff
after taking some locks. This then calls get_unmapped_area
to find an appropriate range of unmapped pages.
Unfortunately, this then calls a function pointer defined in the vma - this is probably so that 32-bit and 64-bit processes can have different ideas of which addresses can be mapped. In the case of x86, this is defined in arch_pick_mmap_layout
, which switches based on whether it's using a 32-bit or 64-bit architecture for this process.
So let's look at the implementation of arch_get_unmapped_area
then. It first gets some reasonable defaults for its search from find_start_end
, then tests to see if the address hint passed in is valid (for thread stacks, no hint is passed). It then starts scanning through the virtual memory map, starting from a cached address, until it finds a hole. It saves the end of the hole for use in the next search, then returns the location of this hole. If it reaches the end of the address space, it starts again from the start, giving it one more chance to find an open area.
So as you can see, normally, it will assign stacks in an increasing manner (for x86; x86-64 uses arch_get_unmapped_area_topdown
and will likely assign them decreasing). However, it also keeps a cache of where to start a search, so it might leave gaps depending on when areas are freed. In particular, when a mmaped area is freed, it might update the free-address-search-cache, so you might see out of order allocations there as well.
That said, this is all an implementation detail. Do not rely on any of this in your program. Just take what addresses mmap
hands out and be happy :)
glibc handles this in nptl/allocatestack.c.
Key line is:
mem = mmap (NULL, size, prot,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
So it just asks the kernel for some anonymous memory, not unlike malloc
does for large blocks. Which block it actually gets is up to the kernel...
精彩评论