开发者

Using C intrinsics and memory alignment difficulties with classes

Ok, so I am just starting to use C intrinsics in my code and I have created a class, which simplified looks like this:

class _Vector3D
{
public:
_Vector3D() 
{
    aVals[0] = _mm_setzero_ps();
    aVals[1] = _mm_setzero_ps();
    aVals[2] = _mm_setzero_ps();
}
~_Vector3D() {}
private:
__m128 aVals[3];
};

So far so good. But when I create a second class with _Vector3D members, I get problems:

class RayPacket
{
public:
RayPacket() {orig = _Vector3D(); dir = _Vector3D(); power = _mm_setzero_ps();}
~RayPacket() {}

RayPacket(_Vector3D origins, _Vector3D directions, float pow)
{
    orig = origins;
    dir = directions;
    pow开发者_运维技巧er = _mm_set_ps1(pow);
}

_Vector3D orig;
_Vector3D dir;
__m128 power;
};

I get the following error:

error C2719: 'origins': formal parameter with __declspec(align('16')) won't be aligned

pointing to the constructor overload:

RayPacket(_Vector3D origins, _Vector3D directions, float pow)

So am I going about this the wrong way? Should I be using structs instead or can I make it work with classes?


This answer is based on documentation and guesswork, not actual knowledge. Beware!

The documentation for __m128 says:

Variables of type _m128 [sic] are automatically aligned on 16-byte boundaries.

So, by using a __m128 member in your class, this forces the compiler to align instances of your class on 16-byte boundaries. Implicitly, __declspec(align(16)) is added to your class, but this is not allowed on function parameters because it is hard (impossible?) for the compiler to enforce alignment inside stack frames.

As a workaround, try passing the constructor arguments by reference:

RayPacket(_Vector3D const &origins, _Vector3D const &directions, float pow)


I think the problem is that the compiler can't guarantees that the stack pointer will be properly aligned when it goes to create a _Vector3D object on the stack to pass to the constructor.

On 32 bit systems, stack pointers are usually guaranteed to be 4-byte aligned (sometimes 8 byte aligned) and on 64-bit systems I think the stack pointer is usually obly guaranteed to be 8-byte aligned, so the compiler doesn't know when it goes to call the constructor that the stack will be aligned properly. You might need to pass a pointer or a reference.

Note that malloc() and friends, the guaranteed alignment returned for a block sometimes isn't guaranteed to be able to handle special types like this. In that case a platform will have a special allocation function to allocate those objects.

See the following for details on MSVC (http://msdn.microsoft.com/en-us/library/aa290049.aspx):

Stack Alignment

On both of the 64-bit platforms, the top of each stackframe is 16-byte aligned. Although this uses more space than is needed, it guarantees that the compiler can place all data on the stack in a way that all elements are aligned.

The x86 compiler uses a different method for aligning the stack. By default, the stack is 4-byte aligned. Although this is space efficient, you can see that there are some data types that need to be 8-byte aligned, and that, in order to get good performance, 16-byte alignment is sometimes needed. The compiler can determine, on some occasions, that dynamic 8-byte stack alignment would be beneficial—notably when there are double values on the stack.

The compiler does this in two ways. First, the compiler can use link-time code generation (LTCG), when specified by the user at compile and link time, to generate the call-tree for the complete program. With this, it can determine regions of the call-tree where 8-byte stack alignment would be beneficial, and it determines call-sites where the dynamic stack alignment gets the best payoff. The second way is used when the function has doubles on the stack, but, for whatever reason, has not yet been 8-byte aligned. The compiler applies a heuristic (which improves with each iteration of the compiler) to determine whether the function should be dynamically 8-byte aligned.

Note A downside to dynamic 8-byte stack alignment, with respect to performance, is that frame pointer omission (/Oy) effectively gets turned off. Register EBP must be used to reference the stack with dynamic 8-byte stack, and therefore it cannot be used as a general register in the function.

The above linked article also has some information on special heap functions that provide alignment guarantees above those of the standard malloc() if you need that.


Try passing _Vector3D by const reference, as in:

RayPacket( const _Vector3D& origins, const _Vector3D& directions, float pow );
That'll put pointers instead of values on the call stack.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜