what is the fastest algorithm for finding the k-maximal elements of the sequence using stl-containers
I need the fastest algorithm for finding the k-maximal elements of the sequence using c++ any stl-containers. My ideas: use list or vector, sort them, get the first k-elements. in this case the number o开发者_运维问答f operations equals n*log(n). n - number of elements. But I think it isn't the best one.
The method using std::partial_sort could be the best answer.
Also take note of std::nth_element
which just get's the element at the nth position right (and partitions the sequence into 'smaller' before and 'bigger' after that nth element
So if you are really interested in just the first k elements (in no particular internal ordering) then nth_element
definitely takes the biscuit
I think the best approach is using a vector to hold the result and building an heap in it as you traverse the input. Once the heap size reaches k
you don't grow it any more (and just keep bubbling-up starting at position k-1
).
When the input is finished the heap is already an answer (supposing you've not been asked to return them in order).
If however k > n/2
then it's probably better to store the ones that got bubbled out of an heap of size n - k
(this assumes however that you know the number of elements n
and not only k
in advance).
Assuming random unsorted data I think the fastest is creating a sorted linked list, looping over the original container and for each element if it's larger than the lowest value in the result vector, hook it in (on the correct sorted location). If the list now contains more then k elements remove the lowest value.
Worst-case (sorted original container) means O(k*n)
, best case O(n)
.
Using QuickSelect you can find them in O(n) worst-case using the "smart" pivot choice described in the wiki page (unsorted: they are the elements that precedes the k-th element in the final order induced by the algorithm).
You can't beat O(n) (because you have to "touch" all elements to be sure your picked one is the k-th), so it's the best you can achieve.
EDIT: If you don't care about the order of the maximal items you can use nth_element
to partition a vector as noted by @sehe. This is O(n)
.
Otherwise if you do care about the ordering:
Use std::partial_sort
on a vector of your data to sort the first k
items. This would run in O(n log k)
.
Alternately heapify your data and pull off k
items. This is still O(n log k)
but I believe with higher constants.
If performance is a concern profile both approaches and use the faster for your data set.
Sadly, I can't find the source code I've written for this, but check this out:
http://en.wikipedia.org/wiki/Radix_sort
I'd use std::make_heap
to build heap from your array or vector of values, which will incur O(n)
time. Then you can repeatedly inspect the top element of the heap and pop it for k
times (using std::pop_heap
), which will incur O(k * log n)
time.
Total runtime complexity will be O(k * log n)
which is better than O (n * log k)
, because n is larger than k. As you asked as well, all these are already available in <algorithm>
so the implementation is very easy.
One can do this in linear time by using a selection algorithm that takes O(n)
in the worst-case, and then going through the vector once and taking precisely the elements that are at least as big as the (n-k)-th order statistic (and keeping count of how many elements you have taken, so that you take exactly k
and not more). However, Cppreference says that std::nth_element
takes linear time on average rather than worst case. I will explain how to do this in a slightly slower but probably simpler way, using heaps. This solution takes time O(max(n,k*log(k)))
in the worst-case to extract the top k
elements of a vector of size n
.
You start by creating a max-heap with all the n
elements, which takes O(n) time with std::make_heap
.
We now want to extract the k
top elements from that heap, but we have to be clever when we do this. If we extract the maximum element k
times, this would cost us O(log(n))
each time, thus O(k*log(n))
in total, which does not achieve our goal.
Instead, we will not touch this n-sized heap, and create a separate maximum heap, which I call the 'waiting heap'. This waiting heap starts with only the maximum element of the original heap, and to get the top k
elements you repeat the following procedure k
times: extract the top element from the waiting heap and add its two descendants to it. The size of the waiting heap increases by one at each step, therefore it is bounded by k
. Since we are doing k
extractions and 2k
insertions (assuming you are using a binary heap), this will cost us no longer than 3*k*log(k)
.
精彩评论