开发者

std::vector faster than plain array?

I have just tried to benchmark std::sort on both std::vector<std::pair<float, unsigned int>> (filled with push_back operation) and plain std::pair<float, unsigned int>> * array (allocated using new and then filled one by one). The compare function just compared the float parts of the pairs.

Surprisingly, when used on 16M values, on std::vector it took only about 1940 ms but on the array it was about 2190 ms. Can anybody explain how can be the vector faster? Is it due to cacheing, or is just array version of std::sort poorly implemented?

gcc (GCC) 4.4.5 20110214 (R开发者_JS百科ed Hat 4.4.5-6)

Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz - cache size 8192 KB (the computer has two quad-core CPUs but I assume the sort is only single-threaded)

EDIT: Now you can call me dumbass, but when I tried to reproduce the code I used for the measurements (I have already deleted the original one) I cannot reproduce the results - now the array version takes about 1915 +- 5ms (measured on 32 runs). I can only swear I have run the test on 10 measurements three times (manually) with similar results, but this is not a rigorous proof.

There was probably some bug in the original code, background process seems unprobable because I have alternated measurements of vector and array versions and the vector results hold and no user was logged in.

Please, consider this question as closed. Thanks for your efforts.


std::vector<std::pair<float, unsigned int>> (filled with push_back operation)

This stores all the data continguously, so memory locality is very good

std::pair<float, unsigned int>> * array (allocated using new and then filled one by one)

This scatters the data all over memory.

You've set up a very unfair comparison between vector and a simple array. The extra indirection involved in the array case is going to hurt, the lack of locality is going to kill cache performance. I'm surprised you don't see a bigger win in favor of contiguous storage.


They will use the same version of sort. It's quite likely random CPU effects, like caching or thread context switches.


Did you use -O3 to compile your code?

If not, do it. All other benchmark results are meaningless, especially for template code.

Did you run the test many times?

This is done to prevent things like interrupts and or caching to have to much of an impact on your result.

Don't use floatint point comparison or arithmetic for benchmarks. The results depend heavily on the compiler, platform, compiler options etc.

How was your testdata created?

The time required by most sorting algorithm changes depending on the sorting of the input data.

Which method for measuring time did you use? Clock cycles? A timer?

Anyway, writing benchmarks that provide reliable results is not as easy as it might seem at first. Don't use a benchmark to determine what the proper code for your problem is.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜