Which kind of data organization using C arrays makes fastest code and why?
Given following data, what is the best way to organize an array of elements so开发者_如何学C that the fastest random access will be possible?
Each element has some int number, a name of 3 characters with '\0' at the end, and a floating point value.
I see two possible methods to organize and access such array:
First:
typedef struct { int num; char name[4]; float val; } t_Element;
t_Element array[900000000];
//random access:
num = array[i].num;
name = array[i].name;
val = array[i].val;
//sequential access:
some_cycle:
num = array[i].num
i++;
Second:
#define NUMS 0
#define NAMES 1
#define VALS 2
#define SIZE (VALS+1)
int array[SIZE][900000000];
//random access:
num = array[NUMS][i];
name = (char*) array[NAMES][i];
val = (float) array[VALS][i];
//sequential access:
p_array_nums = &array[NUMS][i];
some_cycle:
num = *p_array_nums;
p_array_nums++;
My question is, what method is faster and why? My first thought was the second method makes fastest code and allows fastest block copy, but I doubt whether it saves any sensitive number of CPU instructions in comparison to the first method?
It depends on the common access patterns. If you plan to iterate over the data, accessing every element as you go, the struct
approach is better. If you plan to iterate independently over each component, then parallel arrays are better.
This is not a subtle distinction, either. With main memory typically being around two orders of magnitude slower than L1 cache, using the data structure that is appropriate for the usage pattern can possibly triple performance.
I must say, though, that your approach to implementing parallel arrays leaves much to be desired. You should simply declare three arrays instead of getting "clever" with two-dimensional arrays and casting:
int nums[900000000];
char names[900000000][4];
float vals[900000000];
Impossible to say. As with any performance related test, the answer my vary by any one or more of your OS, your CPU, your memory, your compiler etc.
So you need to test for yourself. Set your performance targets, measure, optimise, repeat.
The first one is probably faster, since memory access latency will be the dominant factor in performance. Ideally you should access memory sequentially and contiguously, to make best use of loaded cache lines and reduce cache misses.
Of course the access pattern is critical in any such discussion, which is why sometimes it's better to use SoA (structure of arrays) and other times AoS (array of structures), at least when performance is critical.
Most of the time of course you shouldn't worry about such things (premature optimisation, and all that).
精彩评论