Is it preferred to access the first dimension first than to access the second dimension of a 2 dimension array?
Here is the code,
int array[X][Y] = {0,};
// 1 way to access the data
for (int x = 0; x < X; x++)
for(int y = 0; y < Y; y++)
array[x][y] = compute();
// the other way to ac开发者_StackOverflow中文版cess the data
for (int y = 0; y < Y; y++)
for (int x = 0; x < X; x++)
array[x][y] = compute();
Is it true that the first way is more efficient than the second one since the CPU cache(L1, L2?) optimization? In other words, whether sequential access pattern is preferred even for RAM?
You'll understand this better if you draw a picture of your array in memory:
Y ->
X xxxxx ...
| xxxxx
v xxxxx
.
.
The adress you access will grow linear in Y direction (345, 345+1, 345+2...), but jumps wildly in X direction if Y is large (345, 345+X, 345+X*2). As the cache loads blocks of memory, you'll jump out of them very soon with Y big enough, but will always be in the cache page when walking in Y direction until the cache has to be refreshed.
Also note that this effect can be more extreme when using dynamical allocation. Using the following program with full optimizations gives me the following output (times in seconds)
0.615000
9.878000
EDIT: Other interesting measures:
Replacing the array code with int array[X][Y];
will use stack memory which is limited, so you can't test much bigger X/Y values, but also very fast:
0.000000
0.000000
Using int array[X][Y];
as a global variable will use a block of heap memory and is slow again. So even without dynamical allocation, the first case is much better:
0.929000
8.944000
Using X=1500, Y=1500 shows that the effect is measurable even with smaller arrays:
0.008000
0.059000
EDIT2: Also note there are other possible optimizations to the code as jalf said in a comment to your question. Using this optimization indeed almost doubles the speed (0.453 seconds for X=Y=10000):
// an even faster way to access the array
for (int x = 0; x < X; x++) {
int* arrayptr = array[x];
for (int y = 0; y < Y; y++, arrayptr++)
*arrayptr = x;
}
Code: (note you can also use this to measure your case where the difference shouldn't be that extreme except for big X and Y. Like others already said, measure this and you'll be enlightened).
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define X 10000
#define Y 10000
int main() {
int** array = new int*[X];
for (int x = 0; x < X; x++) {
array[x] = new int[Y];
}
double c = clock();
// 1 way to access the data
for (int x = 0; x < X; x++)
for(int y = 0; y < Y; y++)
array[x][y] = x;
printf("%f\n", (clock() - c) / CLOCKS_PER_SEC);
c = clock();
// the other way to access the data
for (int y = 0; y < Y; y++)
for (int x = 0; x < X; x++)
array[x][y] = x;
printf("%f\n", (clock() - c) / CLOCKS_PER_SEC);
for (int x = 0; x < X; x++) {
delete(array[x]);
}
delete(array);
}
Yes. Especially if the row fits within a cache line. If you used the second method and there were sufficiently large enough rows in your array, there is no cache locality and the cache lines would be constantly be thrashed.
Yep, the first one is faster. In memory matrix is stored a row after another (row major) so there is a greater chance that the neighboring elements will be in the same page in the virtual memory (entire pages are brought to the cache, so the access time is smaller).
The other approach will generate much more cache misses for the larger matrix.
Measure it.
Sequential access is preffered. Should depend largely on the values of X and Y. For certain choices of X and Y I would expect the difference to be considerable.
You should consider using a container like vector, valarray or boost::matrix. C-style arrays can lead to avoidable and annoying bugs.
A famouse expression: "It probably wouldn't make a noticeable difference because of the speed of modern computers."
精彩评论