开发者

optimized byte array shifter

I'm sure this has been asked before, but I need to implement a shift operator on a byte array of variable length size. I've looked around a bit but I have not found any standard way of doing it. I came up with an implementation which works, but I'm not sure how efficient it is. Does anyone know of a standard way to shift an array, or at least have any recommendation on how to boost the performance of my implementation;

char* baLeftShift(const char* array, size_t size, signed int displacement,char* result)
{
    memcpy(result,array,size);
    short shiftBuffer = 0;
    char carryFlag = 0;
    char* byte;
    if(displacement > 0)
    {
        for(;displacement--;)
        {
            for(byte=&(result[size - 1]);((unsigned int)(byte))>=((unsigned int)(result));byte--)
            {
                shiftBuffer = *byte;
                shiftBuffer <<= 1;
                *byte = ((carryFlag) | ((char)(shiftBuffer)));
                carryFlag = ((char*)(&shiftBuffer))[1];
            }
     开发者_StackOverflow   }
    }
    else
    {
        unsigned int offset = ((unsigned int)(result)) + size;
        displacement = -displacement;
        for(;displacement--;)
        {
            for(byte=(char*)result;((unsigned int)(byte)) < offset;byte++)
            {
                shiftBuffer = *byte;
                shiftBuffer <<= 7;
                *byte = ((carryFlag) | ((char*)(&shiftBuffer))[1]);
                carryFlag = ((char)(shiftBuffer));
            }
        }
    }
    return result;
}


If I can just add to what @dwelch is saying, you could try this.

  1. Just move the bytes to their final locations. Then you are left with a shift count such as 3, for example, if each byte still needs to be left-shifted 3 bits into the next higher byte. (This assumes in your mind's eye the bytes are laid out in ascending order from right to left.)

  2. Then rotate each byte to the left by 3. A lookup table might be faster than individually doing an actual rotate. Then, in each byte, the 3 bits to be shifted are now in the right-hand end of the byte.

  3. Now make a mask M, which is (1<<3)-1, which is simply the low order 3 bits turned on.

  4. Now, in order, from high order byte to low order byte, do this:

    c[i] ^= M & (c[i] ^ c[i-1])

That will copy bits to c[i] from c[i-1] under the mask M.

For the last byte, just use a 0 in place of c[i-1].

For right shifts, same idea.


My first suggestion would be to eliminate the for loops around the displacement. You should be able to do the necessary shifts without the for(;displacement--;) loops. For displacements of magnitude greater than 7, things get a little trickier because your inner loop bounds will change and your source offset is no longer 1. i.e. your input buffer offset becomes magnitude / 8 and your shift becomes magnitude % 8.


It does look inefficient and perhaps this is what Nathan was referring to.

assuming a char is 8 bits where this code is running there are two things to do first move the whole bytes, for example if your input array is 0x00,0x00,0x12,0x34 and you shift left 8 bits then you get 0x00 0x12 0x34 0x00, there is no reason to do that in a loop 8 times one bit at a time. so start by shifting the whole chars in the array by (displacement>>3) locations and pad the holes created with zeros some sort of for(ra=(displacement>>3);ra>3)] = array[ra]; for(ra-=(displacement>>3);ra>(7-(displacement&7))). a good compiler will precompute (displacement>>3), displacement&7, 7-(displacement&7) and a good processor will have enough registers to keep all of those values. you might help the compiler by making separate variables for each of those items, but depending on the compiler and how you are using it it could make it worse too.

The bottom line though is time the code. perform a thousand 1 bit shifts then a thousand 2 bit shifts, etc time the whole thing, then try a different algorithm and time it the same way and see if the optimizations make a difference, make it better or worse. If you know ahead of time this code will only ever be used for single or less than 8 bit shifts adjust the timing test accordingly.

your use of the carry flag implies that you are aware that many processors have instructions specifically for chaining infinitely long shifts using the standard register length (for single bit at a time) rotate through carry basically. Which the C language does not support directly. for chaining single bit shifts you could consider assembler and likely outperform the C code. at least the single bit shifts are faster than C code can do. A hybrid of moving the bytes then if the number of bits to shift (displacement&7) is maybe less than 4 use the assembler else use a C loop. again the timing tests will tell you where the optimizations are.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜