parallelizing a task, how many threads?
Say we have an embarrassingly parallel task, ants providing food for their queen for instance.
X = Amount of food nee开发者_运维技巧ded
Y = Time it takes an ant to get a unit of food Z = To get an ant to start foraging for food a single primary thread takes Z seconds.The primary thread can spawn threads until the problem is fully solved but there is a threshold when creating new threads, diminishing returns take effect. To what percentage of progress into a task is it most efficient to stop creating new threads.
For a fixed X and Y value how many threads do I tell the primary thread to spawn?
Edit: operating environment is a nvidia gts250 gpu using CUDA over java wrapper
Do
- Set up some experiments.
- Run experiments, record results.
- Analyse results.
- Draw conclusions.
Until you have answered your question(s)
You write that your problem is embarrassingly parallel. Such problems usually have an obvious 'dimension' along which they are EP. For instance, rendering images in a movie is EP by frame (so send each frame to a separate process(or)) or by rendering sub-task (so set up a pipeline). Rarely would you parallelise image rendering by chopping each image into sub-image tasks. I guess that your problem is EP when decomposed along the dimension of 'ant'. So, probably, you should create one task for each ant. Then you should, probably, have no more ant tasks running at one time than you have available processors. You might want fewer, less likely you may want more ants than processors. But the best ratio of tasks to processors is something you have to figure out yourself.
Arguments about how threads should perform, what the o/s should do and how the hardware ought to cope are, occasionally, illuminating, but they are never a substitute for testing.
Incidentally, for foraging ants I suspect that fixing Y is a poor idea.
And if you want more pointed advice, you might want to show us a bit more of your technologies -- advice about threads appropriate for a Java program may well not apply if you are writing Fortran+OpenMP.
Just as a guess if we assume that each worker thread is CPU bound then it is not practical to create more threads than you have cores.
精彩评论