Performance of OpenMP Parallel Programming in C
I开发者_运维技巧 wrote a C program for Pi
computation using OpenMP getting help from a book. I believe the performance of this program will depend on the processors used.
In my case, I used the environment variable to check the performance of parallelism by increasing the number of processors or threads (I am not sure what is correct ... please correct me)
OMP_NUM_THREADS
I have a quad core processor, so I used (where no_of_threads
is changed from 1 to 10):
$ export OMP_NUM_THREADS=no_of_threads
the performance on running the program is:
1 --- 0m11.036s
2 --- 0m5.554s
3 --- 0m3.800s
4 --- 0m3.166s
5 --- 0m3.376s
8 --- 0m3.042s
10 --- 0m2.960s
15 --- 0m2.957s
I can understand the performance increase until 4, as there are 4 procesors on the system. But I am unable to understand the increase in performance even after the threads are more than 4. I am aware of the fact that each increased thread has an overhead, so why does the performance still increasing..
Can someone please explain this to me in detail.
You probably have a processor that supports hardware threads (Intel calls this hyper-threading).
What this basically means is that your cores each have two instruction caches and can thus execute two interweaving threads more efficiently than usually. This is especially noticeable if the threads often have to wait for memory: usually, a core just stalls while waiting for memory1. A core that supports hyper-threading can instead execute instructions from the other thread during that wait.
1 Not taking into account instruction reordering and prefetching.
精彩评论