How can I use SSE (and SSE2, SSE3, etc.) extensions when building with Visual C++?

2023-03-28 11:58 问答作者：

I'm now working in a small optimisation of a basic dot product function, by using SSE instructions in visual studio.

Here is my code : (function call convention is cdecl) :

float SSEDP4(const vect & vec1, const vect & vec2)
{
    __asm
    {
        // get addresses
       开发者_运维百科 mov ecx, dword ptr[vec1]
        mov edx, dword ptr[vec2]
        // get the first vector
        movups xmm1, xmmword ptr[ecx]
        // get the second vector (must use movups, because data is not assured to be aligned to 16 bytes => TODO align data)
        movups xmm1, xmmword ptr[edx]
        // OP by OP multiply with second vector (by address)
        mulps xmm1, xmm2
        // add everything with horizontal add func (SSE3)
        haddps xmm1, xmm1
        // is one addition enough ?
        // try to extract, we'll see
        pextrd eax, xmm1, 03h
    }
}

vect is a simple struct that contains 4 single precision floats, non aligned to 16 bytes (that is why I use movups and not movaps)

vec1 is initialized with (1.0, 1.2, 1.4, 1.0) and vec2 with (2.0, 1.8, 1.6, 1.0)

Everything compiles well, but at execution, I got 0 in both XMM registers, and so as result while debugging, visual studio shows me 2 registers (MMX1 and MMX2, or sometimes MMX2 and MMX3) which are 64 bits registers, but no XMM and everything to 0.

Does someone has an idea of what's happening ?

Thank you in advance :)

There are a couple of ways to get at SSE instructions on MSVC++:

Compiler Intrinsics -> http://msdn.microsoft.com/en-us/library/t467de55.aspx
External MASM file.

Inline assembly (as in your example code) is no longer a reasonable option because it will not compile when building for non 32 bit, x86, systems. (E.g. building a 64 bit binary will fail)

Moreover, assembly blocks inhibit most optimizations. This is bad for you because even simple things like inlining won't happen for your function. Intrinsics work in a manner that does not defeat optimizers.

You compiled and ran correctly, so you are at least able to use SSE.

In order to view SSE registers in the Registers window, right click on the Registers window and select SSE. That should let you see the XMM registers.

You can also use @xmm<register><component> (e.g., @xmm00 to view xmm0[0]) in the watch window to look at individual components of the XMM registers.

Now, as for your actual problem, you are overwriting xmm1 with [edx] instead of stuffing that into xmm2.

Also, scalar floating point values are returned on the x87 stack in st(0). Instead of trying to remember how to do that, I simply store the result in a stack variable and let the compiler do it for me:

float SSEDP4(const vect & vec1, const vect & vec2)
{
    float result;
    __asm
    {
        // get addresses
        mov ecx, dword ptr[vec1]
        mov edx, dword ptr[vec2]
        // get the first vector
        movups xmm1, xmmword ptr[ecx]
        // get the second vector (must use movups, because data is not assured to be aligned to 16 bytes => TODO align data)
        movups xmm2, xmmword ptr[edx] // xmm2, not xmm1
        // OP by OP multiply with second vector (by address)
        mulps xmm1, xmm2
        // add everything with horizontal add func (SSE3)
        haddps xmm1, xmm1
        // is one addition enough ?
        // try to extract, we'll see
        pextrd [result], xmm1, 03h
    }

    return result;
}

继续阅读：assembly inline-assembly sse visual-c++

How can I use SSE (and SSE2, SSE3, etc.) extensions when building with Visual C++?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？