开发者

Bit manipulations with SSE on subbytes?

Is it possible to use SSE for bit manipulations on data that is not byte-aligned? For example, I would like to do implement this using SSE:

const char buf[8];
assert(n <= 8);
long rv = 0;
for (int i = 0; i < n; i++)
    rv = (rv << 6) | (buf[i] & 0x3f);

Instead, I would like load buf into a xmm register and use SSE instructions to avoid the loop. Unfortunately, the shift operations (such as PSLLW) shift each packed integer by the same amount, so I cannot use it here. Using multiplication (PMULLW) to emulate shifts does not seem right either...

Looking at the SSE documentation, it appears that bit manipulation开发者_JS百科s are not particularly well supported in general. Is this true? Or are there nice bit-manipulation examples using SSE?


I'm not sure SSE instructions help reduce the number of operations required to implement what your code perform here; if anyone knows, I'd be curious as well. Let's decompose the code a bit.

The code is a recursive shift / or sequence, meaning you take the lowest 6 bits, shift them left by six, or the next 6 bits in, shift again, and so on.

So you're converting an array of eight-bit values to a packed array of six-bit values you shrink things from 64bits to 48bits. Like:

|76543210|76543210|76543210|76543210|76543210|76543210|76543210|76543210|
|-----------------|54321054|32105432|10543210|54321054|32105432|10543210|

You can therefore unwind the loop and write it as follows:

/*
 * (buf[x] << 58)
 *   moves lowest six bits of a 64bit long into the highest bits, clears others
 *
 * >> (6 * x + 16)
 *   shifts the bits into the expected final position
 */
#define L(x) (((long)buf[x] << 58) >> (6 * x + 16))

long rv = L(0) | L(1) | L(2) | L(3) | L(4) | L(5) | L(6) | L(7);

As mentioned, I'm not aware of a SSE instruction that would help with this kind of packing (SSE packs do quad-to-word, word-to-short, short-to-byte).

You can perform the operations inside SSE registers, but not, as far as I can see, reduce the number of instructions required to get at the end result.


There are quite a few bitwise operations you can perform in SSE. You can just use _mm_and_si128, _mm_or_si128 and there is a huge set of shift-operations. Google _mm_slli_si128 to find the complete list. These instructions have been added to SSE2 so they're widely available.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜