OpenMP and reduction()
I've got simply 3 functions, one is control function aan the next 2 function are done in a bit different way using OpenMP. But function thread1 gives another score than thread2 and control and I have no idea why?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
float function(float x){
return pow(x,pow(x,sin(x)));
}
float integrate(float begin, float end, int count){
float score = 0 , width = (end-begin)/(1.0*count), i=begin, y1, y2;
for(i = 0; i<count; i++){
score += (function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0;
}
return score;
}
float thread1(float begin, float end, int count){
float score = 0 , width = (end-begin)/(1.0*count), y1, y2;
int i;
#pragma omp parallel for reduction(+:score) private(y1,i) shared(count)
for(i = 0; i<count; i++){
y1 = ((function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0);
score = score + y1;
}
return score;
}
float thread2(float begin, float end, int count){
float score = 0 , width = (end-begin)/(1.0*count), y1, y2;
int i;
float * tab = (float*)malloc(count * sizeof(float));
#pragma omp parallel for
for(i = 0; i<count; i++){
tab[i] = (function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0;
}
for(i=0; i<count; i++)
score += tab[i];
return score;
}
unsigned long long int rdtsc(void){
unsigned long long int x;
unsigned a, d;
__asm__ volatile("rdtsc" : "=a" (a), "=d" (d));
return ((unsigned long long)a) | (((unsigned long long)d) << 32);
}
int main(int argc, char** argv){
unsigned long long counter = 0;
//test
counter = rdtsc();
printf("control: %f \n ",integrate (atof(argv[1]), atof(argv[2]), atoi(argv[3])));
printf("control count: %lld \n",r开发者_JAVA百科dtsc()-counter);
counter = rdtsc();
printf("thread1: %f \n ",thread1(atof(argv[1]), atof(argv[2]), atoi(argv[3])));
printf("thread1 count: %lld \n",rdtsc()-counter);
counter = rdtsc();
printf("thread2: %f \n ",thread2(atof(argv[1]), atof(argv[2]), atoi(argv[3])));
printf("thread2 count: %lld \n",rdtsc()-counter);
return 0;
}
Here are simple answears :
gcc -fopenmp zad2.c -o zad -pg -lm
env OMP_NUM_THREADS=2 ./zad 3 13 100000
control: 5407308.500000
control count: 138308058
thread1: 5407494.000000
thread1 count: 96525618
thread2: 5407308.500000
thread2 count: 104770859
Update:
Ok, I tried to do this more quickly, and not count values for periods twice.
double thread3(double begin, double end, int count){
double score = 0 , width = (end-begin)/(1.0*count), yp, yk;
int i,j, k;
#pragma omp parallel private (yp,yk)
{
int thread_num = omp_get_num_threads();
k = count / thread_num;
#pragma omp for private(i) reduction(+:score)
for(i=0; i<thread_num; i++){
yp = function(begin + i*k*width);
yk = function(begin + (i*k+1)*width);
score += (yp + yk) * width / 2.0;
for(j=i*k +1; j<(i+1)*k; j++){
yp = yk;
yk = function(begin + (j+1)*width);
score += (yp + yk) * width / 2.0;
}
}
#pragma omp for private(i) reduction(+:score)
for(i = k*thread_num; i<count; i++)
score += (function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0;
}
return score;
}
But after few tests I found that the scores are near the right value, but not equal. Sometimes one of the threads doesn't start. When I'm not using OpenMp, the value is correct.
You're integrating a very strongly peaked function - x(xsin(x)) - which covers over 7 orders of magnitude in the range you're integrating it. That's about the limit for a 32-bit floating point number, so there are going to be issues depending on the order you sum the numbers. This isn't an OpenMP thing -- its just a numerical sensitivity thing.
So for instance, consider this completely serial code doing the same integral:
#include <stdio.h>
#include <math.h>
float function(float x){
return pow(x,pow(x,sin(x)));
}
int main(int argc, char **argv) {
const float begin=3., end=13.;
const int count = 100000;
const float width=(end-begin)/(1.*count);
float integral1=0., integral2=0., integral3=0.;
/* left to right */
for (int i=0; i<count; i++) {
integral1 += (function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0;
}
/* right to left */
for (int i=count-1; i>=0; i--) {
integral2 += (function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0;
}
/* centre outwards, first right-to-left, then left-to-right */
for (int i=count/2; i<count; i++) {
integral3 += (function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0;
}
for (int i=count/2-1; i>=0; i--) {
integral3 += (function(begin+(i*width)) + function(begin+(i+1)*width)) * width/2.0;
}
printf("Left to right: %lf\n", integral1);
printf("Right to left: %lf\n", integral2);
printf("Centre outwards: %lf\n", integral3);
return 0;
}
Running this, we get:
$ ./reduce
Left to right: 5407308.500000
Right to left: 5407430.000000
Centre outwards: 5407335.500000
-- the same sort of differences you see. Doing the summation with two threads necessarily changes the order of the summation, and so your answer changes.
There's a few options here. If this was just a test proble, and this function doesn't actually represent what you'll be integrating, you might be fine already. Otherwise, using a different numerical method may help.
But also here, there is a simple solution - the range of the numbers exceeds the range of a float
, making the answer very sensitive to summation order, but fits comfortably within the range of a double
, making the problem much less severe. Note that changing to double
s is not a magic solution to everything; some cases it just postpones the problem or allows you to paper over a flaw in your numerical method. But here it actually addresses the underlying problem fairly well. Changing all the float
s above to double
s gives:
$ ./reduce
Left to right: 5407589.272885
Right to left: 5407589.272885
Centre outwards: 5407589.272885
On the other hand, even doubles wouldn't save you if you needed to integrate this function in the range (18,23).
精彩评论