开发者

Caching performance of serial vs padded data

I got some objects with certain values, for example: (1)

struct massPoint {
    double pos;
    double vel;
    double acc;
} objects[LOTS];

or the same in arrays:

(2)

double pos[LOTS];
double vel[LOTS];
double acc[LOTS];

First question: Is it right if i call (1) padded data and (2) serial data?

Second question: If i do some operations which would only affect vel and acc and no pos, and i have LOTS of them, would (2) be preferable since it would be better in terms of caching performance because the pos[] dont hav开发者_开发知识库e to be cached this way and in (1) it has to? Or do i not get the concept at all?


No idea for your first question

For your second question there is no general answer this depends on your architecture and of your usage pattern.

  • if you really have random (= unpredictable) access and each double makes up a cacheline and your data is correctly aligned both would be equivalent in terms of caching.
  • your second method is clearly superior on modern architectures if you have streaming access to the data, that is for which the compiler / runtime / hardware can easily predict the future access and that have enough hardware registers for the all the pointers and the data
  • your first method could be superior in cases you have only few registers, since for the second the compiler might need to keep track of your current index in the three different arrays

so in summary it may depend on a lot of factors, but a tendencies that the second method would be preferable under many circumstances


If you are doing operations on just positions, then just velocities, or just accelerations, then (2) is better.

In other cases - where you are using more than just one type in lots of calculations - then (1) will be better.

This is assuming that:

  • the total size of each set is too big to fit in local cache (probable).
  • you're not doing complicated calculations that require other external data anyway.
  • the operations you're performing aren't convertible to vector operations.

Though, to be honest, this sounds like premature optimisation: and the best thing to do would be to profile with something like valgrind, which will be able to tell you the precise answer for your platform.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜