Segmentation fault for pthreads in a recursive call
Given the code below, I get a segmentation fault if I run it with n>16.
I think it has something to do with the stack, but I can't figure it out. Could anyone give me a hand? The code is not mine, and really not important. I would just like someone to give me a hand with what is happening. This SO question is very similar, but there's not enough information (the person who posts the answer briefly talks about the problem, but then goes on to talk about a different language). Besides, notice that with two gigs and no recursion, I can (if I'm doing it right) successfully create more than 16000 threads (though the OS only creates about 500 and runs about 300). Anyway, where am I getting the seg fault here and why? Thanks.
#include <pthread.h>
#include <stdio.h>
static void* fibonacci_thread( void* arg ) {
int n = (int)arg, fib;
pthread_t th1, th2;
void* pvalue; /*Holds the value*/
switch (n) {
case 0: return (void*)0;
case 1: /* Fallthru, Fib(1)=Fib(2)=1 */
case 2: return (void*)1;
default: break;
}
pthread_create(&th1, NULL, fibonacci_thread, (void*)(n-1));
pthread_create( &开发者_开发问答th2, NULL, fibonacci_thread, (void*)(n-2));
pthread_join(th1, &pvalue);
fib = (int)pvalue;
pthread_join(th2, &pvalue);
fib += (int)pvalue;
return (void*)fib;
}
int main(int argc, char *argv[])
{
int n=15;
printf ("%d\n",(int)fibonacci_thread((void*)n));
return 0;
}
This is not a good way to do a Fibonacci sequence :-)
Your first thread starts two others, each of those starts two others and so forth. So when n > 16
, you may end up with a very large number of threads (in the thousands) (a).
Unless your CPU has way more cores than mine, you'll be wasting your time running thousands of threads for a CPU-bound task like this. For purely CPU bound tasks, you're better off having as many threads as there are physical execution engines (cores or CPUs) available to you. Obviously that changes where you're not purely CPU bound.
If you want an efficient recursive (non-threaded) Fibonacci calculator, you should use something like (pseudo-code):
def fib(n, n ranges from 1 to infinity):
if n is 1 or 2:
return 1
return fib(n-1) + fib(n-2)
Fibonacci isn't even really that good for non-threaded recursion since the problem doesn't reduce very fast. By that, I mean calculating fib(1000)
will use 1000 stack frames. Compare this with a recursive binary tree search where only ten stack frames are needed. That's because the former only removes 1/1000 of the search space for each stack frame while the latter removes one half of the remaining search space.
The best way to do Fibonacci is with iteration:
def fib(n, n ranges from 1 to infinity):
if n is 1 or 2:
return 1
last2 = 1, last1 = 1
for i ranges from 3 to n:
last0 = last2 + last1
last2 = last1
last1 = last0
return last0
Of course, if you want a blindingly fast Fibonacci generator, you write a program to generate all the ones you can store (in, for example, a long value) and write out a C structure to contain them. Then incorporate that output into your C program and your runtime "calculations" will blow any other method out of the water. This is your standard "trade off space for time" optimisation method:
long fib (size_t n) {
static long fibs[] = {0, 1, 1, 2, 3, 5, 8, 13, ...};
if (n > sizeof(fibs) / sizeof(*fibs))
return -1;
return fibs[n];
}
These guidelines apply to most situations where the search space doesn't reduce that fast (not just Fibonacci).
(a) Originally, I thought this would be 216 but, as the following program shows (and thanks to Nemo for setting me straight), it's not quite that bad - I didn't take into account the reducing nature of the spawned threads as you approached fib(0)
:
#include <stdio.h>
static count = 0;
static void fib(int n) {
if (n <= 2) return;
count++; fib(n-1);
count++; fib(n-2);
}
int main (int argc, char *argv[]) {
fib (atoi (argv[1]));
printf ("%d\n", count);
return 0;
}
This is equivalent to the code you have, but it simply increments a counter for each spawned thread rather than actually spawning them. The number of threads for various input values are:
N Threads
--- ---------
1 0
2 0
3 2
4 4
5 8
6 14
:
14
15 1,218
16 1,972
:
20 13,528
:
26 242,784
:
32 4,356,616
Now note that, while I said it wasn't as bad, I didn't say it was good :-) Even two thousand threads is a fair load on a system with each having their own kernel structures and stacks. And you can see that, while the increases start small, they quickly accelerate to the point where they're unmanageable. And it's not like the 32nd number is large - it's only a smidgeon over two million.
So the bottom line still stands: use recursion where it makes sense (where you can reduce the search space relatively quickly so as to not run out of stack space), and use threds where it makes sense (where you don't end up running so many that you overload the resources of the operating system).
Heck with it, might as well make this an answer.
First, check the return values of pthread_create
and pthread_join
. (Always, always, always check for errors. Just assert
they are returning zero if you are feeling lazy, but never ignore them.)
Second, I could have sworn Linux glibc allocates something like 2 megabytes of stack per thread by default (configurable via pthread_attr_setstacksize). Sure, that is only virtual memory, but on a 32-bit system that still limits you to ~2000 threads total.
Finally, I believe the correct estimate for the number of threads this will spawn is basically fib(n)
itself (how nicely recursive). Or roughly phi^n
, where phi
is (1+sqrt(5))/2
. So the number of threads here is closer to 2000 than to 65000, which is consistent with my estimate for where a 32-bit system will run out of VM.
[edit]
To determine the default stack size for new threads on your system, run this program:
int main(int argc, char *argv[])
{
size_t stacksize;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_getstacksize(&attr, &stacksize);
phthread_attr_destroy(&attr);
printf("Default stack size = %zd\n", stacksize);
return 0;
}
[edit 2]
To repeat: This is nowhere near 2^16 threads.
Let f(n) be the number of threads spawned when computing fib(n).
When n=16, one thread spawns two new threads: One to compute fib(15) and another to compute fib(14). So f(16) = f(15) + f(14) + 1.
And in general f(n) = f(n-1) + f(n-2) + 1.
As it turns out, the solution to this recurrence is that f(n) is just the sum of the first n Fibonacci numbers:
1 + 1 + 2 + 3 + 5 + 8 // f(6)
+ 1 + 1 + 2 + 3 + 5 // + f(5)
+ 1 // + 1
= 1 + 1 + 2 + 3 + 5 + 8 + 13 // = f(7)
This is (very) roughly phi^(n+1)
, not 2^n
. Total for f(16) is still measured in the low thousands, not tens of thousands.
[edit 3]
Ah, I see, the crux of your question is this (hoisted from the comments):
Thanks Nemo for a detailed answer. I did a little test and pthread_created ~10,000 threads with just a while(1) loop inside so they don't terminate... and it did! True that the OS was smart enouth to only create about 1000 and run an even smaller number, but it didn't run out of stack. Why do I not get a segfault when I generate lots more than THREAD_MAX, but I do when I do it recursively?
Here is my guess.
You only have a few cores. At any time, the kernel has to decide which threads are going to run. If you have (say) 2 cores and 500 threads, then any particular thread is only going to run 1/250 of the time. So your main loop spawning new threads is not going to run very often. I am not even sure whether the kernel's scheduler is "fair" with respect to threads within a single process, so it is at least conceivable that with 1000 threads the main thread never gets to run at all.
At the very least, each thread doing while (1);
is going to run for 1/HZ
on its core before giving up its time slice. This is probably 1ms, but it could be as high as 10ms depending on how your kernel was configured. So even if the scheduler is fair, your main thread will only get to run around once a second when you have thousands of threads.
Since only the main thread is creating new threads, the rate of thread creation slows to a crawl and possibly even stops.
Try this. Instead of while (1);
for the child threads in your experiment, try while (1) pause();
. (pause
is from unistd.h.) This will keep the child threads blocked and should allow the main thread to keep grinding away creating new threads, leading to your crash.
And again, please check what pthread_create
returns.
first thing i would do is run a statement like printf("%i", PTHREAD_THREADS_MAX);
and see what the value is; i don't think that the max threads of the OS is necessarily the same as the max number of pthreads, although i do see that you say you can successfully achieve 16000 threads with no recursion so i'm just mentioning it as something i would check in general.
should PTHREAD_THREADS_MAX
be significantly > the number threads you are achieving, i would start checking the return values of the pthread_create() calls to see if you are getting EAGAIN
. my suspicion is that the answer to your question is that you're getting a segfault from attempting to use an uninitialized thread in a join...
also, as paxdiablo mentioned, you're talking on the order of 2^16
threads at n=16
here (a little less assuming some of them finishing before the last are created); i would probably try to keep a log to see in what order each was created. probably the easiest thing would be just to use the (n-1)
(n-2)
values as your log items otherwise you would have to use a semaphore or similar to protect a counter...
printf might bog down, in fact, i wouldn't be surprised if that actually affected things by allowing more threads to finish before new ones were started, but so i'd probably just log using file write()
; can just be a simple file you should be able to get a feel for what's going on by looking at the patterns of numbers there. (wait, that assumes file ops are thread safe; i think they are; it's been a while.)
also, once checking for EAGAIN
you could try sleeping an bit and retrying; perhaps it will ramp up over time and the system is just being overwhelmed by the sheer number of thread requests and failing for some other reason than being out of resources; this would verify whether just waiting and restarting could get you where you want to be.
finally; i might try to re-write the function as a fork()
(i know fork() is evil or whatever ;)) and see whether you have better luck there.
:)
精彩评论