Intel Parallel Studio timing inconsistencies

2023-02-23 15:26 问答作者：

I have some code that uses Intel TBB and I'm running on a 32 core machine. In the code, I use

parallel_for(blocked_range (2,left_image_width-2, left_image_width /32) ...

to spawn 32 to threads that do concurrent work, there are no race conditions and each thread is hopefully given the same amount of work. I'm using clock_t to measure how long my program takes. For a certain image, it takes roughly 19 seconds to complete.

Then I ran my code through Intel Parallel Studio and it ran the code in 2 seconds. This is the result I was expecting but I can't figure out why there's such a large difference between the two. Does time_t sum the clock cycles on all the cores? Even then it doesn't make sense. Below is the snippet in question.

clock_t begin=clock();

create_threads_and_do_work();

clock_t end=clock();
double diffticks=end-begin;
double diffms=(diffticks*1000)/CLOCKS_PER_SEC;
cout<<"And the time is "<<开发者_Python百科diffms<<" ms"<<endl;

Any advice would be appreciated.

It's isn't quite clear if the difference in run time is a result of two different inputs (images) or simply two different run-time measuring methods (clock_t difference vs. Intel software measurement). Furthermore, you aren't showing us what goes on in create_threads_and_do_work(), and you didn't mention what tool within Intel Parallel Studio you are using, is it Vtune?

Your clock_t difference method will sum the processing time of the thread that called it (the main thread in your example), but it might not count the processing time of the threads spawned within create_threads_and_do_work(). Whether it does or doesn't depends on whether within that function you wait for all threads to complete and only then exit the function or if you simply spawn the threads and exit immediately (before they complete processing). If all you do in the function is that parallel_for(), then the clock_t difference should yield the right result and should be no different than other run-time measurements.

Within Intel Parallel Studio there is a profiling tool called Vtune. is a powerful tool and When you run your program through it you can view (in a graphically pleasing way) the processing time (as well as times called) of each function in your code. I'm pretty sure after doing this you'll probably figure it out.

One last idea - did the program complete its course when using Intel software? I'm asking because sometimes Vtune will collect data for some time and then stop without allowing the program to complete.

继续阅读：parallel-processing tbb visual-studio-2010

Intel Parallel Studio timing inconsistencies

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？