开发者

How can I use SSE (and SSE2, SSE3, etc.) extensions when building with Visual C++?

I'm now working in a small optimisation of a basic dot product function, by using SSE instructions in visual studio.

Here is my code : (function call convention is cdecl) :

float SSEDP4(const vect & vec1, const vect & vec2)
{
    __asm
    {
        // get addresses
       开发者_运维百科 mov ecx, dword ptr[vec1]
        mov edx, dword ptr[vec2]
        // get the first vector
        movups xmm1, xmmword ptr[ecx]
        // get the second vector (must use movups, because data is not assured to be aligned to 16 bytes => TODO align data)
        movups xmm1, xmmword ptr[edx]
        // OP by OP multiply with second vector (by address)
        mulps xmm1, xmm2
        // add everything with horizontal add func (SSE3)
        haddps xmm1, xmm1
        // is one addition enough ?
        // try to extract, we'll see
        pextrd eax, xmm1, 03h
    }
}

vect is a simple struct that contains 4 single precision floats, non aligned to 16 bytes (that is why I use movups and not movaps)

vec1 is initialized with (1.0, 1.2, 1.4, 1.0) and vec2 with (2.0, 1.8, 1.6, 1.0)

Everything compiles well, but at execution, I got 0 in both XMM registers, and so as result while debugging, visual studio shows me 2 registers (MMX1 and MMX2, or sometimes MMX2 and MMX3) which are 64 bits registers, but no XMM and everything to 0.

Does someone has an idea of what's happening ?

Thank you in advance :)


There are a couple of ways to get at SSE instructions on MSVC++:

  1. Compiler Intrinsics -> http://msdn.microsoft.com/en-us/library/t467de55.aspx
  2. External MASM file.

Inline assembly (as in your example code) is no longer a reasonable option because it will not compile when building for non 32 bit, x86, systems. (E.g. building a 64 bit binary will fail)

Moreover, assembly blocks inhibit most optimizations. This is bad for you because even simple things like inlining won't happen for your function. Intrinsics work in a manner that does not defeat optimizers.


You compiled and ran correctly, so you are at least able to use SSE.

In order to view SSE registers in the Registers window, right click on the Registers window and select SSE. That should let you see the XMM registers.

You can also use @xmm<register><component> (e.g., @xmm00 to view xmm0[0]) in the watch window to look at individual components of the XMM registers.

Now, as for your actual problem, you are overwriting xmm1 with [edx] instead of stuffing that into xmm2.

Also, scalar floating point values are returned on the x87 stack in st(0). Instead of trying to remember how to do that, I simply store the result in a stack variable and let the compiler do it for me:

float SSEDP4(const vect & vec1, const vect & vec2)
{
    float result;
    __asm
    {
        // get addresses
        mov ecx, dword ptr[vec1]
        mov edx, dword ptr[vec2]
        // get the first vector
        movups xmm1, xmmword ptr[ecx]
        // get the second vector (must use movups, because data is not assured to be aligned to 16 bytes => TODO align data)
        movups xmm2, xmmword ptr[edx] // xmm2, not xmm1
        // OP by OP multiply with second vector (by address)
        mulps xmm1, xmm2
        // add everything with horizontal add func (SSE3)
        haddps xmm1, xmm1
        // is one addition enough ?
        // try to extract, we'll see
        pextrd [result], xmm1, 03h
    }

    return result;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜