Complex loop in a C++ program portable to OpenMP and MPI?
I h开发者_如何学Pythonave a C++ number crunching program. The structure is:
a) data input, data preparation
b) "big" loop, uses global and local data (lots of different variables in both cases)
c) postprocess results and write data
The most intensive part is "b", which is basically a loop. I need to speedup the program in a cluster. 25 blades, 4 cores each. I wonder whether I could use here OpenMP and MPI, or if you can point me to tutorials, not general cases, but complex and "big" for loops.
Thanks
Actually, you should use both.
Use MPI to distribute tasks between blades and OpenMP to fully utilize each blade. Take some time to understand how memory and sharing works on each case.
You cannot devide your task between blade using OpenMP. Try to devide you loop on several part and distribute capacity on them. For example if you want composition of 2 vectors with N size. N/2 will be on one node and another part on another.
But transmition costs between blades is palpable. Thus if your task is not actually great. May be would be better if you distribute it into 4 cores.
精彩评论