Which kind of data organization using C arrays makes fastest code and why?

2023-02-16 22:12 问答作者：

Given following data, what is the best way to organize an array of elements so开发者_如何学C that the fastest random access will be possible?

Each element has some int number, a name of 3 characters with '\0' at the end, and a floating point value.

I see two possible methods to organize and access such array:

First:

typedef struct { int num; char name[4]; float val; } t_Element;
t_Element array[900000000];
//random access:
num = array[i].num;
name = array[i].name;
val = array[i].val;
//sequential access:
some_cycle:
  num = array[i].num
  i++;

Second:

#define NUMS 0
#define NAMES 1
#define VALS 2
#define SIZE (VALS+1)
int array[SIZE][900000000];
//random access:
num = array[NUMS][i];
name = (char*) array[NAMES][i];
val = (float) array[VALS][i];
//sequential access:
p_array_nums = &array[NUMS][i];
some_cycle:
  num = *p_array_nums;
  p_array_nums++;

My question is, what method is faster and why? My first thought was the second method makes fastest code and allows fastest block copy, but I doubt whether it saves any sensitive number of CPU instructions in comparison to the first method?

It depends on the common access patterns. If you plan to iterate over the data, accessing every element as you go, the struct approach is better. If you plan to iterate independently over each component, then parallel arrays are better.

This is not a subtle distinction, either. With main memory typically being around two orders of magnitude slower than L1 cache, using the data structure that is appropriate for the usage pattern can possibly triple performance.

I must say, though, that your approach to implementing parallel arrays leaves much to be desired. You should simply declare three arrays instead of getting "clever" with two-dimensional arrays and casting:

int nums[900000000];
char names[900000000][4];
float vals[900000000];

Impossible to say. As with any performance related test, the answer my vary by any one or more of your OS, your CPU, your memory, your compiler etc.

So you need to test for yourself. Set your performance targets, measure, optimise, repeat.

The first one is probably faster, since memory access latency will be the dominant factor in performance. Ideally you should access memory sequentially and contiguously, to make best use of loaded cache lines and reduce cache misses.

Of course the access pattern is critical in any such discussion, which is why sometimes it's better to use SoA (structure of arrays) and other times AoS (array of structures), at least when performance is critical.

Most of the time of course you shouldn't worry about such things (premature optimisation, and all that).

继续阅读：arrays c data-structures pointers

Which kind of data organization using C arrays makes fastest code and why?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？