How do Bitshifting and UInt64 work?
While reading a post on StackOverflow (http://stackoverflow.com/questions/1502081/im-trying-to-optimize-this-c-code-using-4-way-loop-unrolling), which is now marked as closed, I came across an answer (comment, in fact) that said the following: "The two inner loops could possibly get a speed boost by using UInt64 and bit shifting"
Here is the code that was int he post:
char rotate8_descr[] = "rotate8: rotate with 8x8 blocking";
void rotate8(int dim, pixel *src, pixel *dst)
{
int i, j, ii, jj;
for(ii = 0; ii < dim; ii += 8)
for(jj = 0; jj < dim; jj += 8)
for (i = ii; i < ii + 8; i++)
for (j = jj; j < jj + 8; j++)
dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}
Could anyone please explain 开发者_开发技巧how would that be applied here? I am interested in knowing how to apply bitshifting on this code, or a similar code, and why that would help in performance. Also, how would this code be optimized for cache usage? Any suggestions?
Assume this code was Double Tiled/Blocked (big tile=32, and inside it tiles of 16), and also Loop Invariant Code Motion was applied.. would it still benefit from bitshifting and UInt64?
If not, then what other suggestions would work?
Thanks!
If the pixels were smaller, you could use 8 Uint64 registers (they are big and there are plenty of them) to cumulate there the result for rotated matrix.
Example for sizeof(pixel) == 1
and little endian machine:
for (int y = 0; y < 8; ++y){
// for every line, we get 8 pixels from row y into src0.
// they should go in the last colomn of the result
// so after 8 iterations they'll get exactly in the 8ht byte
Uint64 src0 = *(Uint64*)(src + dim * y);
dst0 = (dst0 << 8) | ( src0 & 0xff); // this was pixel src[y][0]
dst1 = (dst1 << 8) | ((src0 >> 8) & 0xff); // and pixel src[y][1]
etc...
};
// now the 8 dst0..dst7 registers contain rows 0..7 of the result.
// putting them there
*(Uint64*)(dst) = dst0;
*(Uint64*)(dst + dim) = dst1;
etc..
The good part is that it's easier to unroll and reorder, and there are fewer memory accesses.
精彩评论