Troubles introducing SIMD commands into the code
I have a basic calculation function that I apply on each item in an array. This function does more then just summing two vectors.
I wanted to work on multiple items from my array in parallel using SIMD commands.
As I found these kind of examples too simple for my case (they don't include function calls): http://www.doc.ic.ac.uk/~nloriant/files/scfpsc-pc.pdf
I tried using array notation as in here: http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/mac/optaps/common/optaps_elem_functions.htm
But this did not accelerate my code. I don't understand what I am doing wrong and if I need to go to the more assembly-like style of SIMD, how do I introduce function calls there...
If anyone can help me or refer me to a good source for my needs I'll be very thakful.
Thank you!!!!
code example:
This is the basic function applied on each item in the array:
float VarFlow::gauss_seidel_step(IplImage* u, int i, float h, float J11, float J12, float J13, float vi){
int x = i%u->width;
int y = i/u->width;
int start_y, end_y, start_x, end_x;
int N_num = 0;
start_y = y - 1;
end_y = y + 1;
start_x = x - 1;
end_x = x+1;
float temp_u = 0;
// Sum top neighbor
if(start_y > -1){
temp_u += *((float*)(u->imageData + start_y*u->widthStep) + x);
N_num++;
}
// Sum bottom neighbor
if(end_y < u->height){
temp_u += *((float*)(u->imageData + end_y*u->widthStep) + x);
N_num++;
}
// Sum left neighbor
if(start_x > -1){
temp_u += *((float*)(u->imageData + y*u->widthStep) + start_x);
N_num++;
}
// Sum right neighbor
if(end_x < u->width){
temp_u += *((float*)(u->imageData + y*u->widthStep) + end_x);
N_num++;
}
temp_u = temp_u - (h*h/alpha)*(J12*vi + J13);
temp_u = temp_u / (N_num + (h*h/alpha)*J11);
return temp_u;
}
I'd like to declare it with __declspec (vector) and call it like so:
u_ptr[0:max_i:1] = gauss_seidel_step(imgU, vect[0:max_i:1], h, fxfx_ptr[0:max_i:1], fxfy_ptr[0:max_i:1], fxft_ptr[0:max_i:1], v_ptr[0:max_i:1]);
v_ptr[0:max_i:1]开发者_StackOverflow = gauss_seidel_step(imgV, vect[0:max_i:1], h, fyfy_ptr[0:max_i:1], fxfy_ptr[0:max_i:1], fyft_ptr[0:max_i:1], u_ptr[0:max_i:1]);
Instead of a for loop.
I'll be happy to get a direction with this (maybe a link to a similar example) but not a full solution.
Thanks!
SIMD and conditional branching do not mix well.
Turn your conditional statements into boolean masks and multiplications. That will send you down the right path for vectorizing the operations.
e.g.
if(end_x < u->width){
temp_u += value;
N_num++;
}
becomes
ltmask = (end_x < u->width); // see _mm_cmplt_ps
temp_u += ltmask*value; // see _mm_add_ps, _mm_and_ps
N_num += ltmask; // use _mm_and_ps with a vector of 1.0f
精彩评论