开发者

Keeping largest 10 numbers

While iterating through a dataset, what's the best way to keep track of only the top 10 numbers so far, in sorted order?

Solution...Ended up implementing Generic Min and Max Heaps...as sadly they are not available in Java libraries or readily on the internet....No gauruntees on the code...

import java.util.ArrayList;

public class MaxHeapGeneric <K extends Comparable> {
    //ArrayList to hold the heap
    ArrayList<K> h = new ArrayList<K>();
    public MaxHeapGeneric()
    {

    }
    public int getSize()
    {
        return h.size();
    }

    private K get(int key){
        return h.get(key);
    }


    public void add(K key){
        h.add(null);
        int k = h.size() - 1;
        while (k > 0){
            int parent = (k-1)/2;
            K parentValue = h.get(parent);
            //MaxHeap -
            //for minheap - if(key > parentValue)
            if(key.compareTo(parentValue) <= 0) break;
            h.set(k, parentValue);
            k = parent;
        }
        h.set(k, key);
    }
    public K getMax()
    {
        return h.get(0);
    }
    public void percolateUp(int k, K key){
        if(h.isEmpty())
            return ;

        while(k < h.size() /2){
            int child = 2*k + 1; //left child
            if(   child < h.size() -1 && (h.get(child).compareTo(h.get(child+1)) < 0)   )
            {
                child++;
            }

            if(key.compareTo(h.get(child)) >=0) break;

            h.set(k, h.get(child));
            k = child;
        }
        h.set(k, key);
    }
    public K remove()
    {
        K removeNode = h.get(0);
        K lastNode = h.remove(h.size() - 1);
        percolateUp(0, lastNode);
        return removeNode;
    }
    public boolean isEmpty()
    {
        return h.isEmpty();
    }

    public static void main(String[] args)
    {
        MaxHeapGeneric<Integer> test = new MaxHeapGeneric<Integer>();

        test.add(5);
        test.add(9);
        test.add(445);
        test.add(1);
        test.add(534);
        test.add(23);

        while(!test.isEmpty())
        {
            System.out.println(test.remove());
        }

    }

}

And a min heap

import java.util.ArrayList;


public class MinHeapGeneric <K extends Comparable> {
    //ArrayList to hold the heap
    ArrayList<K> h = new ArrayList<K>();
    public MinHeapGeneric()
    {

    }
    public int getSize()
    {
        return h.size();
    }

    private K get(int key){
        return h.get(key);
    }


    public void add(K key){
        h.add(null);
        int k = h.size() - 1;
        while 开发者_JS百科(k > 0){
            int parent = (k-1)/2;
            K parentValue = h.get(parent);
            //for minheap - if(key > parentValue)
            if(key.compareTo(parentValue) > 0) break;
            h.set(k, parentValue);
            k = parent;
        }
        h.set(k, key);
    }
    public K getMax()
    {
        return h.get(0);
    }
    public void percolateUp(int k, K key){
        if(h.isEmpty())
            return ;

        while(k < h.size() /2){
            int child = 2*k + 1; //left child
            if(   child < h.size() -1 && (h.get(child).compareTo(h.get(child+1)) >= 0)   )
            {
                child++;
            }

            if(key.compareTo(h.get(child)) < 0) break;

            h.set(k, h.get(child));
            k = child;
        }
        h.set(k, key);
    }
    public K remove()
    {
        K removeNode = h.get(0);
        K lastNode = h.remove(h.size() - 1);
        percolateUp(0, lastNode);
        return removeNode;
    }
    public boolean isEmpty()
    {
        return h.isEmpty();
    }

    public static void main(String[] args)
    {
        MinHeapGeneric<Integer> test = new MinHeapGeneric<Integer>();

        test.add(5);
        test.add(9);
        test.add(445);
        test.add(1);
        test.add(534);
        test.add(23);

        while(!test.isEmpty())
        {
            System.out.println(test.remove());
        }

    }

}


Use a min-heap (priority queue) to keep track of the top 10 items. With a binary heap, the time complexity is O(N log M), where N is the number of items and M is 10.

Compared to storing the top items in an array, this is faster for large M: array-based approach is O(NM). Ditto for linked lists.

In pseudocode:

heap = empty min-heap
for each datum d:
    heap.push(d)   // add the new element onto the heap
    if heap.size > 10:
        heap.pop() // remove the smallest element
    endif
endfor

Now heap contains 10 largest items. To pop:

while heap is not empty:
    item = heap.top()
    print item
endwhile
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜