what is the fastest algorithm for finding the k-maximal elements of the sequence using stl-containers

2023-02-22 13:14 问答作者：

I need the fastest algorithm for finding the k-maximal elements of the sequence using c++ any stl-containers. My ideas: use list or vector, sort them, get the first k-elements. in this case the number o开发者_运维问答f operations equals n*log(n). n - number of elements. But I think it isn't the best one.

The method using std::partial_sort could be the best answer.

Also take note of std::nth_element which just get's the element at the nth position right (and partitions the sequence into 'smaller' before and 'bigger' after that nth element

So if you are really interested in just the first k elements (in no particular internal ordering) then nth_element definitely takes the biscuit

I think the best approach is using a vector to hold the result and building an heap in it as you traverse the input. Once the heap size reaches k you don't grow it any more (and just keep bubbling-up starting at position k-1).

When the input is finished the heap is already an answer (supposing you've not been asked to return them in order).

If however k > n/2 then it's probably better to store the ones that got bubbled out of an heap of size n - k (this assumes however that you know the number of elements n and not only k in advance).

Assuming random unsorted data I think the fastest is creating a sorted linked list, looping over the original container and for each element if it's larger than the lowest value in the result vector, hook it in (on the correct sorted location). If the list now contains more then k elements remove the lowest value.

Worst-case (sorted original container) means O(k*n), best case O(n).

Using QuickSelect you can find them in O(n) worst-case using the "smart" pivot choice described in the wiki page (unsorted: they are the elements that precedes the k-th element in the final order induced by the algorithm).

You can't beat O(n) (because you have to "touch" all elements to be sure your picked one is the k-th), so it's the best you can achieve.

EDIT: If you don't care about the order of the maximal items you can use nth_element to partition a vector as noted by @sehe. This is O(n).

Otherwise if you do care about the ordering:

Use std::partial_sort on a vector of your data to sort the first k items. This would run in O(n log k).

Alternately heapify your data and pull off k items. This is still O(n log k) but I believe with higher constants.

If performance is a concern profile both approaches and use the faster for your data set.

Sadly, I can't find the source code I've written for this, but check this out:

http://en.wikipedia.org/wiki/Radix_sort

I'd use std::make_heap to build heap from your array or vector of values, which will incur O(n) time. Then you can repeatedly inspect the top element of the heap and pop it for k times (using std::pop_heap), which will incur O(k * log n) time.

Total runtime complexity will be O(k * log n) which is better than O (n * log k), because n is larger than k. As you asked as well, all these are already available in <algorithm> so the implementation is very easy.

One can do this in linear time by using a selection algorithm that takes O(n) in the worst-case, and then going through the vector once and taking precisely the elements that are at least as big as the (n-k)-th order statistic (and keeping count of how many elements you have taken, so that you take exactly k and not more). However, Cppreference says that std::nth_element takes linear time on average rather than worst case. I will explain how to do this in a slightly slower but probably simpler way, using heaps. This solution takes time O(max(n,k*log(k))) in the worst-case to extract the top k elements of a vector of size n.

You start by creating a max-heap with all the n elements, which takes O(n) time with std::make_heap.

We now want to extract the k top elements from that heap, but we have to be clever when we do this. If we extract the maximum element k times, this would cost us O(log(n)) each time, thus O(k*log(n)) in total, which does not achieve our goal.

Instead, we will not touch this n-sized heap, and create a separate maximum heap, which I call the 'waiting heap'. This waiting heap starts with only the maximum element of the original heap, and to get the top k elements you repeat the following procedure k times: extract the top element from the waiting heap and add its two descendants to it. The size of the waiting heap increases by one at each step, therefore it is bounded by k. Since we are doing k extractions and 2k insertions (assuming you are using a binary heap), this will cost us no longer than 3*k*log(k).

继续阅读：algorithm stl

what is the fastest algorithm for finding the k-maximal elements of the sequence using stl-containers

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？