How To Store Values In Non-Contiguous Memory Locations With SSE Intrinsics?

2023-01-20 09:55 问答作者：

I'm very new to SSE and have optimized a section of code using intrinsics. I'm pleased with the operation itself, but I'm looking for a better way to write the result. The results end up in three _m128i variables.

What I'm trying to do is store specific bytes from the result values to non-contiguous memory locations. I'm currently doing this:

__m128i values0,values1,values2;

/*Do stuff and store the results in values0, values1, and values2*/

y[0]        = (BYTE)_mm_extract_epi16(values0,0);
cb[2]=cb[3] = (BYTE)_mm_extract_epi16(values0,2);
y[3]        = (BYTE)_mm_extract_epi16(values0,4);
cr[4]=cr[5] = (BYTE)_mm_extract_epi16(values0,6);

cb[0]=cb[1] = (BYTE)_mm_extract_epi16(values1,0);
y[1]        = (BYTE)_mm_extract_epi16(values1,2);
cr[2]=cr[3] = (BYTE)_mm_extract_epi16(values1,4);
y[4]        = (BYTE)_mm_extract_epi16(values1,6);

cr[0]=cr[1] = (BYTE)_mm_extract_epi16(values2,0);
y[2]        = (BYTE)_mm_extract_epi16(values2,2);
cb[4]=cb[5] = (BYTE)_mm_extract_epi16(values2,4);
y[5]        = (BYTE)_mm_extract_epi16(values2,6);

Where y, cb, and cr are byte (unsigned char) arrays. This seems wrong to me for reasons I can't define. Does anyone have any suggestions for a bett开发者_如何学编程er way?

Thanks!

You basically can't -- SSE doesn't have a scatter store, and it's sort of all designed around the idea of doing vectorized work on contiguous data streams. Really, most of the work involved in making something SIMD is rearranging your data so that it is contiguous and vectorizable. So the best thing to do is rearrange your data structures so that you can write to them 16 bytes at a time. Don't forget that you can reorder the components inside your SIMD vector before you commit them to memory.

Failing that, the PEXTRW op (_mm_extract_epi16 intrinsic) is pretty much the only way to pull a short from an SSE register and store into an integer register. The other approach available to you is to use the unpack and shuffle ops (_mm_shuffle_ps etc) to rotate data into the low word of the register and then MOVSS/_mm_store_ss() to store that low word to memory one at a time.

You will probably find that using a union, or moving data between the SSE and general purpose registers, will provide very poor performance due to a subtle CPU implementation detail called a load-hit-store stall. Basically, there's no direct way to move data between the register types; the processor has to first write the SSE data to memory, and then read it back again into the GPR. In many cases, this means it has to stall the load operation and wait until the store clears before any further instructions can be run.

I don't know about SSE specifically, but generally the whole point of vectorised units is that they can operate very fast provided the data obeys particular alignment and formatting. So it's up to you to provide and extract the data in the correct format and alignment.

SSE does not have the scatter/gather functionality that you need, although this is probably coming in future SIMD architectures.

As has already been suggested, you can use a union, e.g.:

typedef union
{
    __m128i v;
    uint8_t a8[16];
    uint16_t a16[8];
    uint32_t a32[4];
} U128;

Ideally this kind of manipulation only happens outside any critical loops, as it's very inefficient compared to straightforward SIMD operations on contiguous data elements.

You could try to use union's to extract the bytes.

union
{
    float value;
    unsigned char ch[8];
};

and then assign the bytes as needed
Play around with union-idea, maybe replace the unsigned char ch[8] with a anonymous struct?
Maybe you can get some more ideas from here

继续阅读：c intrinsics sse sse2

How To Store Values In Non-Contiguous Memory Locations With SSE Intrinsics?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？