开发者

Find Kth largest number in single pass without storing the whole array

The algo I have in mind is

  • keep an MaxHeap of size K
  • insert each element
  • drop out smaller value if heap is full
  • In the end, Kth max is the smaller of MaxHeap

That will give me O(NlogK). Is there a better algorithm? I can't do quick selection, becau开发者_如何学运维se the array can't be stored in memory.


Depending on your memory restrictions, you can use a modified version of the median-of-medians algorithm to solve the problem in O(n) time and O(k) space.

The idea is as follows. Maintain an array of size 2k in memory. Fill this buffer with the first 2k elements from the array, then run the median-of-medians algorithm on it to put the largest k elements in the first half of the array and the smallest k elements in the second half. Then, discard the smallest k elements. Now, load the next k elements into the second half of the array, use the median-of-medians algorithm to again put the top k elements in the left side and the bottom k elements on the right. If you iterate this process across the array - replacing the second half of the buffer with the next k elements from the array, then using median-of-medians to move the top k of those to the left half - then at the end you will have the top k elements in the left half of the array. Finding the smallest of those (in O(k) time) will then give you the kth largest element.

Overall, you end up making O(n / k) calls to the median-of-medians algorithm with an array of size O(k), which is O(n / k) calls to an algorithm that takes O(k) time, for a net runtime of O(n). This, combined with the final step, runs in O(n + k) = O(n) time. Moreover, since the memory usage for the median-of-medians step is O(k), and since you have a buffer of size O(k) lying around, this uses only O(k) memory. In other words, it's asymptotically faster than the min-heap solution and asymptotically equivalent in memory.

Hope this helps!


This is known as the http://en.wikipedia.org/wiki/Selection_algorithm

One algorithm in particular is http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

It is O(N) time and O(1) space. I believe it can be done in-place, if your array isn't sorted. If your array is being stored externally (hard disk, network, etc.), you can leverage the ~K words you have to work with. If your array is dynamically generated by a function, you'll be in a similar situation.


Modify the algorithm slightly, because a maxheap does not support efficient "find smallest".

  • Insert the first K items into a min-heap
  • For the remaining items, if the value is larger than the heap head

    • pop the head, and insert the new item.
  • The head is the Kth largest item.

The worst case is still O(N lg K) for an input where each item is greater than the smallest of the last K. But for a randomly distributed input, you will do only have to do the heap operation on a smaller percentage of the items.


ran this code - its running fine without touching element position or sorting the same.

public class nth {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        calc.getit(2);
    }

}
class calc{
    static int arr[] = { 1, 23, 47, 81, 92, 87, 52, 48, 56, 66, 65, 76, 71, 85,
        49, 53, 56, 61, 65, 84 };

    static void getit(int k){
        int i,j=0;
        for(i=0;i<arr.length;i++){
            int c=0;
            for(j=0;j<arr.length;j++){
                if(arr[i]>arr[j]){
                    c++;
                }
            }
            if(c==k-1){
                System.out.println(arr[i]);
            }
        }
    }
}


I have an implementation with PriorityQueue. Try this:

import java.util.PriorityQueue;

public class FindKthLargestElementWithHeap {

    /**
     * We can use a min heap to solve this problem. 
     * The heap stores the top k elements.
     * Whenever the size is greater than k, delete the min.
     * Time complexity is O(nlog(k)).
     * Space complexity is O(k) for storing the top k numbers.
     */
    public static int findKthLargest(int[] nums, int k) {
        PriorityQueue<Integer> q = new PriorityQueue<>(k);
        for(int i: nums){
            q.offer(i);

            if(q.size()>k){
                q.poll();
            }
        }

        return q.peek();
    }

    public static void main(String args[]) {
        int[] nums = {5,8,6,97,12,3,5,6,4,2,3,};
        //Return the second largest number
        System.out.println(findKthLargest(nums,2));
    }

}

For more information, please visit: https://github.com/m-vahidalizadeh/foundations/blob/master/src/data_structures/FindKthLargestElementWithHeap.java.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜