Bit manipulations with SSE on subbytes?
Is it possible to use SSE for bit manipulations on data that is not byte-aligned? For example, I would like to do implement this using SSE:
const char buf[8];
assert(n <= 8);
long rv = 0;
for (int i = 0; i < n; i++)
rv = (rv << 6) | (buf[i] & 0x3f);
Instead, I would like load buf into a xmm register and use SSE instructions to avoid the loop. Unfortunately, the shift operations (such as PSLLW) shift each packed integer by the same amount, so I cannot use it here. Using multiplication (PMULLW) to emulate shifts does not seem right either...
Looking at the SSE documentation, it appears that bit manipulation开发者_JS百科s are not particularly well supported in general. Is this true? Or are there nice bit-manipulation examples using SSE?
I'm not sure SSE instructions help reduce the number of operations required to implement what your code perform here; if anyone knows, I'd be curious as well. Let's decompose the code a bit.
The code is a recursive shift / or sequence, meaning you take the lowest 6 bits, shift them left by six, or the next 6 bits in, shift again, and so on.
So you're converting an array of eight-bit values to a packed array of six-bit values you shrink things from 64bits to 48bits. Like:
|76543210|76543210|76543210|76543210|76543210|76543210|76543210|76543210| |-----------------|54321054|32105432|10543210|54321054|32105432|10543210|
You can therefore unwind the loop and write it as follows:
/*
* (buf[x] << 58)
* moves lowest six bits of a 64bit long into the highest bits, clears others
*
* >> (6 * x + 16)
* shifts the bits into the expected final position
*/
#define L(x) (((long)buf[x] << 58) >> (6 * x + 16))
long rv = L(0) | L(1) | L(2) | L(3) | L(4) | L(5) | L(6) | L(7);
As mentioned, I'm not aware of a SSE instruction that would help with this kind of packing (SSE packs do quad-to-word, word-to-short, short-to-byte).
You can perform the operations inside SSE registers, but not, as far as I can see, reduce the number of instructions required to get at the end result.
There are quite a few bitwise operations you can perform in SSE. You can just use _mm_and_si128, _mm_or_si128 and there is a huge set of shift-operations. Google _mm_slli_si128 to find the complete list. These instructions have been added to SSE2 so they're widely available.
精彩评论