Sorting 1000-2000 elements with many cache misses

2023-01-02 01:12 问答作者：

I have an array of 1000-2000 elements which are pointers to objects. I want to keep my array sorted and obviously I want to do this as quick as possible. They are sorted by a member and not allocated contiguously so assume a cache miss whenever I access the sort-by member.

Currently I'm sorting on-demand rather than on-add, but because of the cache misses and [presumably] non-inlining of the member access the inner loop of my quick sort is slow.

I'm doing tests and trying things now, (and see what the actual bottleneck 开发者_开发百科is) but can anyone recommend a good alternative to speeding this up? Should I do an insert-sort instead of quicksorting on-demand, or should I try and change my model to make the elements contigious and reduce cache misses? OR, is there a sort algorithm I've not come accross which is good for data that is going to cache miss?

Edit: Maybe I worded this wrong :), I don't actually need my array sorted all the time (I'm not iterating through them sequentially for anything) I just need it sorted when I'm doing a binary chop to find a matching object, and doing that quicksort at that time (when I want to search) is currently my bottleneck, because of the cache misses and jumps (I'm using a < operator on my object, but I'm hoping that inlines in release)

Simple approach: insertion sort on every insert. Since your elements are not aligned in memory I'm guessing linked list. If so, then you could transform it into a linked list with jumps to the 10th element, the 100th and so on. This is kind of similar to the next suggestion.

Or you reorganize your container structure into a binary tree (or what every tree you like, B, B*, red-black, ...) and insert elements like you would insert them into a search tree.

~~Running a quicksort on each insertion is enormously inefficient.~~ Doing a binary search and insert operation would likely be orders of magnitude faster. Using a binary search tree instead of a linear array would reduce the insert cost.

Edit: I missed that you were doing sort on extraction, not insert. Regardless, keeping things sorted amortizes sorting time over each insert, which almost has to be a win, unless you have a lot of inserts for each extraction.

If you want to keep the sort on-extract methodology, then maybe switch to merge sort, or another sort that has good performance for mostly-sorted data.

I think the best approach in your case would be changing your data structure to something logarithmic and rethinking your architecture. Because the bottleneck of your application is not that sorting thing, but the question why do you have to sort everything on each insert and try to compensate that by adding on-demand sort?.

Another thing you could try (that is based on your current implementation) is implementing an external pointer - something mapping table / function and sort those second keys, but I actually doubt it would benefit in this case.

Instead of the array of the pointers you may consider an array of structs which consist of both a pointer to your object and the sort criteria. That is:

Instead of

struct MyType {
    // ...
    int m_SomeField; // this is the sort criteria
};

std::vector<MyType*> arr;

You may do this:

strcut ArrayElement {
    MyType* m_pObj; // the actual object
    int m_SortCriteria; // should be always equal to the m_pObj->m_SomeField

};

std::vector<ArrayElement> arr;

You may also remove the m_SomeField field from your struct, if you only access your object via this array.

By such in order to sort your array you won't need to dereference m_pObj every iteration. Hence you'll utilize the cache.

Of course you must keep the m_SortCriteria always synchronized with m_SomeField of the object (in case you're editing it).

As you mention, you're going to have to do some profiling to determine if this is a bottleneck and if other approaches provide any relief.

Alternatives to using an array are std::set or std::multiset which are normally implemented as R-B binary trees, and so have good performance for most applications. You're going to have to weigh using them against the frequency of the sort-when-searched pattern you implemented.

In either case, I wouldn't recommend rolling-your-own sort or search unless you're interested in learning more about how it's done.

I would think that sorting on insertion would be better. We are talking O(log N) comparisons here, so say ceil( O(log N) ) + 1 retrieval of the data to sort with.

For 2000, it amounts to: 8

What's great about this is that you can buffer the data of the element to be inserted, that's how you only have 8 function calls to actually insert.

You may wish to look at some inlining, but do profile before you're sure THIS is the tight spot.

Nowadays you could use a set, either a std::set, if you have unique values in your structure member, or, std::multiset if you have duplicate values in you structure member.

One side note: The concept using pointers, is in general not advisable.

STL containers (if used correctly) give you nearly always an optimized performance.

Anyway. Please see some example code:

#include <iostream>
#include <array>
#include <algorithm>
#include <set>
#include <iterator>

// Demo data structure, whatever
struct Data {
    int i{};
};

// -----------------------------------------------------------------------------------------
// All in the below section is executed during compile time. Not during runtime
// It will create an array to some thousands pointer
constexpr std::size_t DemoSize = 4000u;
using DemoPtrData = std::array<const Data*, DemoSize>;
using DemoData = std::array<Data, DemoSize>;
consteval DemoData createDemoData() {
    DemoData dd{};
    int k{};
    for (Data& d : dd)
        d.i = k++*2;
    return dd;
}
constexpr DemoData demoData = createDemoData();

consteval DemoPtrData createDemoPtrData(const DemoData& dd) {
    DemoPtrData dpd{};
    for (std::size_t k{}; k < dpd.size(); ++k)
        dpd[k] = &dd[k];
    return dpd;
}
constexpr DemoPtrData dpd = createDemoPtrData(demoData);
// -----------------------------------------------------------------------------------------


struct Comp {bool operator () (const Data* d1, const Data* d2) const  { return d1->i < d2->i; }};
using MySet = std::multiset<const Data*, Comp>;

int main() {
    // Add some thousand pointers. Will be sorted according to struct member
    MySet mySet{ dpd.begin(), dpd.end() };

    // Extract a range of data. integer values between 42 and 52
    const Data* p42 = dpd[21];
    const Data* p52 = dpd[26];

    // Show result
    for (auto iptr = mySet.lower_bound(p42); iptr != mySet.upper_bound(p52); ++iptr)
        std::cout << (*iptr)->i << '\n';

    // Insert a new element
    Data d1{ 47 };
    mySet.insert(&d1);

    // Show again
    std::cout << "\n\n";
        for (auto iptr = mySet.lower_bound(p42); iptr != mySet.upper_bound(p52); ++iptr)
        std::cout << (*iptr)->i << '\n';
}

继续阅读：algorithm sorting

Sorting 1000-2000 elements with many cache misses

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？