fastest way to iterate matrix of known dimensions

2023-02-12 12:29 问答作者：

I was wondering what the fastest way to iterate through a matrix is in c/c++.

The best method i've come up with so far is to map the matrix to a single dimension.

Then use pointer arithmetic, any other method that might be faster?

Dimensions are known at run time but not compile time, matrix is fully populated.

#include <iostream>
#include <time.h>
#define XMAX 500
#define YMAX 400
#define ZMAX 300

int main()
{
    srand(0);   
    register double sum = 0;
    register int i;
    register int j;
    register int k;

    double *arr_ptr;
    arr_ptr = new double[XMAX*YMAX*ZMAX];

    for (i=0; i<XMAX*YMAX*ZMAX; ++i)
    {
        *(arr_ptr+i) = rand()/double(RAND_MAX);
    }

    clock_t start, finish;
    start = clock();

    for (i=0; i<XMAX; ++i)
    {
        for (j=0; j<YMAX; ++j)
        {
            for (k=0; k<ZMAX; ++k)
            {
                sum += *(arr_ptr+i*YMAX*ZMAX+j*ZMAX+k);
            }
        }
   开发者_Python百科 }

    finish = clock();
    std::cout << "sum: " << sum << "\telapsed: " << finish - start << std::endl;
    std::cin.get();

    delete[] arr_ptr;
}

Why bother with the three nested for-loops? You could just do

for (i=0; i<XMAX*YMAX*ZMAX; ++i)
{
    sum += *(arr_ptr+i);
}

This is 650ms faster than your code for XMAX 500, YMAX 400 and ZMAX 100, run 100 times, according to ideone.com compiler.

double *p_current, *p_end;

p_current = arr_ptr;
p_end = (arr_ptr + XMAX*YMAX*ZMAX);
while(p_current != p_end) {
    sum += *p_current++;
}

See: old version, new version

Actually it doesn't matter because the compiler will optimize it anyway. So arr[i][j][k] and *(arr_ptr+i*YMAX*ZMAX+j*ZMAX+k) will be equally fast.

In your example the bounds are even constants, so normal three dimensional arrays would work with this, be it C or C++.

Then, C and C++ are really different languages on what is concerned dynamically allocated arrays with variable bounds, don't mix them up. For C++ use vector classes and such things. They are made for this and they should be efficient.

In C, since C99 there are VLA, variable length arrays. Contrary to urban myth they may be quite efficient, if you don't allocate them on the stack. Use malloc as for any big chunk of memory in C.

double (*arr_ptr)[XMAX][YMAX][ZMAX]
  = malloc(sizeof(*arr_ptr));

for (register size_t i=0; i<XMAX; ++i)
  for (register size_t j=0; j<YMAX; ++j)
    for (register size_t k=0; k<ZMAX; ++k)
       (*arr_ptr)[i][j][k] = rand()/double(RAND_MAX);

.

free(arr_ptr);

Modern processors have quite complex addressing schemes, so it might not be necessary to effectively do the complete index calculation. Your compiler usually knows better than you.

Then, to be efficient it might be much more important how you declare and handle your loop variables. Use the correct types for indexing, size_t is the correct unsigned type for that. int may easily overflow when you compute 3-D flattened indices and having a signed type here makes not much sense.

Then declare those variables as local as possible, makes things clearer for you and the compiler.

register is just a contract with the compiler that you never will take the address of such an index. Usually this doesn't improve things a lot. But may prevent you from doing inefficient things when you modify your code later.

And last but not least, if you really worry about efficiency, check what your compiler produces. E.g gcc has the option -S to produce the intermediate assembler. Read it instead of speculating about efficiency.

OpenCV uses pointer arithmetic:

double *ptr = arr_ptr;
for (i=0; i<XMAX*YMAX*ZMAX; ++i)
{
    sum += *ptr++;
}

I guess it might be a little faster somehow. Try it and show us the timings!

double *ptr = arr_ptr;
for (int i=XMAX*YMAX*ZMAX; i>0; --i)
{
    sum += *ptr++;
}

Comparing the loop variable to zero, instead of to some constant may save one or two clock cycles for each iteration (for example, using JNZ instruction on intel CPUs)

The first thing that needs to be said is that stack allocated multidimensional arrays are stored in memory (in C and in C++) in row major order. Namely matrix[ 2 ][ 2 ] = { { 1, 2 }, { 3, 4 } } is going to be stored in memory just like you had actually declared an array[ 4 ] = { 1, 2, 3, 4 } and the matrix[][] syntax is just syntactic sugar for *( matrix + i * 2 + j).

So the fastest way to traverse the matrix depends on how you traverse it: in row major order or column major and how big the matrix is:

if the entire matrix can fit in the CPU cache than the traversal order doesn't matter;
if the matrix is bigger than the CPU cache than doing row major traversal cause less CPU cache misses.

The best way to know if you have a performance problem in matrix operations and what causes it is to profile your code.

For very large blocks of data, consider parallel operations. In this case, a sum can be computed with a gather operation -- the form of which will depend on the parallel framework you choose.

继续阅读：c

fastest way to iterate matrix of known dimensions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？