How to profile OpenMP bottlenecks
I have a loop that has been parallelized by OpenMP, but due to the nature of the task, there are 4 critical
clauses.
What would be the best way to profile the speed up and find out which of the cri开发者_运维百科tical clauses (or maybe non-critical(!) ) take up the most time inside the loop?
I use Ubuntu 10.04 with g++ 4.4.3
Scalasca is a nice tool for profiling OpenMP (and MPI) codes and analyzing the results. Tau is also very nice but much harder to use. The intel tools, like the vtune, are also good but very expensive.
Arm MAP has OpenMP and pthreads profiling - and works without needing to instrument or modify your source code. You can see synchronization issues and where threads are spending time to the source line level. The OpenMP profiling blog entry is worth reading.
MAP is widely used for high performance computing as it is also profiles multiprocess applications such as MPI.
OpenMP includes the functions omp_get_wtime() and omp_get_wtick() for measuring timing performance (docs here), I would recommend using these.
Otherwise try a profiler. I prefer the google CPU profiler which can be found here.
There is also the manual way described in this answer.
There is also the ompP tool which I have used a number of times in the last ten years. I have found it to be really useful to identify and quantify load imbalance and parallel/serial regions. The web page seems to be down now but I also found it on web archive earlier this year.
edit: updated home directory
精彩评论