开发者

Stack assignment to a thread

I've been trying to piece together how stack memory is handed out to threads. I haven't been able to piece the whole thing together. I tried to go to the code, but I'm more confused, so I'm asking for your help.

I asked this question a little while ago. So assume that particular program (therefore, all threads are within the same process). If I write printfs for each beginning of stack pointer, and then how much is allocated for them, then I get stuff like the table at the end of this message, where the first column is a time_t usec, the second doesn't matter, the third is the tid of the thread, the fourth is the guard size, then begin of stack, end of stack (sorted by beginning of stack), last but one is the allocated stack (8 Megs by default) and the last column is the difference between the end of the first allocated stack, and the beginning of the next stack.

This means that (I think), if 0, then the stacks are contiguous, if positive, since the stack grows down in 开发者_如何学Cmemory, then it means that there is "free space" of however many Mbs between a tid and the next (in memory). If negative, this means that memory is being reused. So this may mean that that stack space has been freed before this thread was created.

My problem is: what exactly is the algorithm that assigns stack space to threads (at a higher level than code) and why do I sometimes get contiguous stacks, and sometimes not, and sometimes get values like 7.94140625 and 0.0625 in the last column?

This is all Linux 2.6, C and pthreads.

This may be a question we will have to iterate on to get it right, and for this I apologize, but I'm telling you what I know right now. Feel free to ask for clarifications.

Thanks for this. The table follows.

52815   14  14786   4096    92549120    100941824   8392704 0
52481   14  14784   4096    100941824   109334528   8392704 0
51700   14  14777   4096    109334528   117727232   8392704 0
70747   14  14806   4096    117727232   126119936   8392704 8.00390625
75813   14  14824   4096    117727232   126119936   8392704 0
51464   14  14776   4096    126119936   134512640   8392704 8.00390625
76679   14  14833   4096    126119936   134512640   8392704 -4.51953125
53799   14  14791   4096    139251712   147644416   8392704 -4.90234375
52708   14  14785   4096    152784896   161177600   8392704 0
50912   14  14773   4096    161177600   169570304   8392704 0
51617   14  14775   4096    169570304   177963008   8392704 0
70028   14  14793   4096    177963008   186355712   8392704 0
51048   14  14774   4096    186355712   194748416   8392704 0
50596   14  14771   4096    194748416   203141120   8392704 8.00390625


First, by stracing a simple test program that launches a single thread, we can see the syscalls it used to create a new thread. Here's a simple test program:

#include <pthread.h>
#include <stdio.h>

void *test(void *x) { }

int main() {
        pthread_t thr;
        printf("start\n");
        pthread_create(&thr, NULL, test, NULL);
        pthread_join(thr, NULL);
        printf("end\n");
        return 0;
}

And the relevant portion of its strace output:

write(1, "start\n", 6start
)                  = 6
mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xf6e32000
brk(0)                                  = 0x8915000
brk(0x8936000)                          = 0x8936000
mprotect(0xf6e32000, 4096, PROT_NONE)   = 0
clone(child_stack=0xf7632494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xf7632bd8, {entry_number:12, base_addr:0xf7632b70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xf7632bd8) = 9181
futex(0xf7632bd8, FUTEX_WAIT, 9181, NULL) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "end\n", 4end
)                    = 4
exit_group(0)                           = ?

We can see that it obtains a stack from mmap with PROT_READ|PROT_WRITE protection and MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK flags. It then protects the first (ie, lowest) page of the stack, to detect stack overflows. The rest of the calls aren't really relevant to the discussion at hand.

So, then, how does mmap allocate the stack, then? Well, let's start at mmap_pgoff in the Linux kernel; the entry point for the modern mmap2 syscall. It delegates to do_mmap_pgoff after taking some locks. This then calls get_unmapped_area to find an appropriate range of unmapped pages.

Unfortunately, this then calls a function pointer defined in the vma - this is probably so that 32-bit and 64-bit processes can have different ideas of which addresses can be mapped. In the case of x86, this is defined in arch_pick_mmap_layout, which switches based on whether it's using a 32-bit or 64-bit architecture for this process.

So let's look at the implementation of arch_get_unmapped_area then. It first gets some reasonable defaults for its search from find_start_end, then tests to see if the address hint passed in is valid (for thread stacks, no hint is passed). It then starts scanning through the virtual memory map, starting from a cached address, until it finds a hole. It saves the end of the hole for use in the next search, then returns the location of this hole. If it reaches the end of the address space, it starts again from the start, giving it one more chance to find an open area.

So as you can see, normally, it will assign stacks in an increasing manner (for x86; x86-64 uses arch_get_unmapped_area_topdown and will likely assign them decreasing). However, it also keeps a cache of where to start a search, so it might leave gaps depending on when areas are freed. In particular, when a mmaped area is freed, it might update the free-address-search-cache, so you might see out of order allocations there as well.

That said, this is all an implementation detail. Do not rely on any of this in your program. Just take what addresses mmap hands out and be happy :)


glibc handles this in nptl/allocatestack.c.

Key line is:

mem = mmap (NULL, size, prot,
            MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);

So it just asks the kernel for some anonymous memory, not unlike malloc does for large blocks. Which block it actually gets is up to the kernel...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜