How can I exchange the low 128 bits and high 128 bits in a 256 bit AVX (YMM) register

2023-03-30 21:33 问答作者：

I am porting SSE SIMD code to use the 256 bit AVX extensions and cannot seem to find any instruction that will blend/shuffle/move the high 128 bits and the low 128 bits.

The backing story:

What I really want is VHADDPS/_mm256_hadd_ps to act like HADDPS/_mm_hadd_ps, only with 256 bit words.开发者_开发知识库 Unfortunately, it acts like two calls to HADDPS acting independently on the low and high words.

Using VPERM2F128, one can swap the low 128 and high 128 bits ( as well as other permutations). The instrinsic function usage looks like

x = _mm256_permute2f128_ps( x , x , 1)

The third argument is a control word which gives the user a lot of flexibility. See the Intel Instrinsic Guide for details.

x = _mm256_permute4x64_epi64(x, 0b01'00'11'10);

Read about it here. And Try it online!

Note: This instruction needs AVX2 (not just AVX1).

As commented by @PeterCordes speed-wise on Zen2 / Zen3 CPUs _mm256_permute2x128_si256(x, x, i) is the best option, even though it has 3 arguments compared to function _mm256_permute4x64_epi64(x, i) suggested by me having 2 arguments.

On Zen1 and KNL/KNM (and Bulldozer-family Excavator), _mm256_permute4x64_epi64(x, i) suggested by me is more efficient. On other CPUs (including mainstream Intel), both choices are equal.

As already said both _mm256_permute2x128_si256(x, y, i) and _mm256_permute4x64_epi64(x, i) need AVX2, while _mm256_permute2f128_si256(x, i) needs just AVX1.

The only way that I know of doing this is with _mm256_extractf128_si256 and _mm256_set_m128i. E.g. to swap the two halves of a 256 bit vector:

__m128i v0h = _mm256_extractf128_si256(v0, 0);
__m128i v0l = _mm256_extractf128_si256(v0, 1);
__m256i v1 = _mm256_set_m128i(v0h, v0l);

继续阅读：avx simd x86

How can I exchange the low 128 bits and high 128 bits in a 256 bit AVX (YMM) register

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？