openMP and SSE, my program doesn't speed up
Here is a part of my code which runs parallel:
timer.Start();
for(int i = 0; i < params.epochs; ++i)
{
#pragma omp for
for(int j = 0; j < min_net; ++j)
{
std::pair<CVectorSSE,CVectorSSE>& sample = data_set[j];
nets[j]->Approximate(sample.first,net_outputs[j]);
out_gradients[j].SetDifference(net_outputs[j],sample.second);
nets[j]->BackPropagateGradient(out_gradients[j],net_gradients[j]);
}
}
timer.Stop();
epochs = 100
I have AMD athlon X2 5000+ When I launch this code without omp directive the time is same... And when I look on task manager / performance when runing both programs (with/without omp) in both cases 2 cores are used... So it seems that VS (VS 2008) somehow optimizes code like omp??? The code inside parallel loop uses SSE instructions... I was wondering that maybe in multicore procs there is only one SSE unit but it would be stupid... So maybe some1 can tell me what i am doing wrong? I know that it depends on my code inside the loop but if this code inside is parallel then it MUST speed up...Okay I am definitly doing something wrong - look at this code:
time_t start;
time_t stop;
start = time(NULL);
#pragma omp for
for(int i = 0; i < 10; ++i)
{
Sleep(1000);
}
stop = time(NULL);
cout<<difftime(stop,start)&开发者_JS百科lt;<endl;
without omp it should sleep for 10 secs (10*1000ms) with omp it should sleep less than 10 secs because 2 threads can sleep in one time right? BUT it sleeps again 10 secs - how it is possible?
I tried the second example on Linux with gcc. My program runs for 3 secs on Core i3. I guess the problem you are having is that you have not configured OpenMP correctly. GCC need an option -fopenmp to enable OpenMP. Similar configuration may be necessary for VS.
精彩评论