Optimizing Vector elements swaps using CUDA
Since I am new to cuda .. I need your kind help I have this long vector, for each group of 24 elements, I need to do the following: for the first 12 elements, the even numbered elements are multiplied by -1, for the second 12 elements, the odd numbered elements are multiplied by -1 then the following swap takes place:
Graph: because I don't yet have enough points, I couldn't post the image so here it is:
http://www.freeimagehosting.net/image.php?e4b88fb666.png
I have written this piece of code, and wonder if you could help me further optimize it to solve for divergence or bank conflicts ..
//subvector is a multiple of 24, Mds and Nds are shared memory
____shared____ double Mds[subVector];
____shared____ double Nds[subVector];
int tx = threadIdx.x;
int tx_mod = tx ^ 0x0001;
int basex = __umul24(blockDim.x, blockIdx.x);
Mds[tx] = M.elements[basex + tx];
__syncthreads();
// flip the signs
if (tx < (tx/24)*24 + 12)
{
//if < 12 and even
if ((tx & 0x0001)==0)
Mds[tx] = -Mds[tx];
}
else
if (tx < (tx/24)*24 + 24)
{
//if >12 and < 24 and odd
if ((tx & 0x0001)==1)
Mds[tx] = -Mds[tx];
}
__syncthreads();
if (tx < (tx/24)*24 + 6)
{
//for the first 6 elements .. swap with last six in the 24elements group (see graph)
Nds[tx] = Mds[tx_mod + 18];
Mds [tx_mod + 18] = Mds [tx];
Mds[tx] = Nds[tx];
}
el开发者_如何学JAVAse
if (tx < (tx/24)*24 + 12)
{
// for the second 6 elements .. swp with next adjacent group (see graph)
Nds[tx] = Mds[tx_mod + 6];
Mds [tx_mod + 6] = Mds [tx];
Mds[tx] = Nds[tx];
}
__syncthreads();
Thanks in advance ..
paul gave you pretty good starting points you previous questions.
couple things to watch out for: you are doing non-base 2 division which is expensive. Instead try to utilize multidimensional nature of the thread block. For example, make the x-dimension of size 24, which will eliminate need for division.
in general, try to fit thread block dimensions to reflect your data dimensions.
simplify sign flipping: for example, if you do not want to flip sign, you can still multiplied by identity 1
. Figure out how to map even/odd numbers to 1 and -1 using just arithmetic: for example sign = (even*2+1) - 2
where even is either 1 or 0.
精彩评论