Keeping largest 10 numbers
While iterating through a dataset, what's the best way to keep track of only the top 10 numbers so far, in sorted order?
Solution...Ended up implementing Generic Min and Max Heaps...as sadly they are not available in Java libraries or readily on the internet....No gauruntees on the code...
import java.util.ArrayList;
public class MaxHeapGeneric <K extends Comparable> {
//ArrayList to hold the heap
ArrayList<K> h = new ArrayList<K>();
public MaxHeapGeneric()
{
}
public int getSize()
{
return h.size();
}
private K get(int key){
return h.get(key);
}
public void add(K key){
h.add(null);
int k = h.size() - 1;
while (k > 0){
int parent = (k-1)/2;
K parentValue = h.get(parent);
//MaxHeap -
//for minheap - if(key > parentValue)
if(key.compareTo(parentValue) <= 0) break;
h.set(k, parentValue);
k = parent;
}
h.set(k, key);
}
public K getMax()
{
return h.get(0);
}
public void percolateUp(int k, K key){
if(h.isEmpty())
return ;
while(k < h.size() /2){
int child = 2*k + 1; //left child
if( child < h.size() -1 && (h.get(child).compareTo(h.get(child+1)) < 0) )
{
child++;
}
if(key.compareTo(h.get(child)) >=0) break;
h.set(k, h.get(child));
k = child;
}
h.set(k, key);
}
public K remove()
{
K removeNode = h.get(0);
K lastNode = h.remove(h.size() - 1);
percolateUp(0, lastNode);
return removeNode;
}
public boolean isEmpty()
{
return h.isEmpty();
}
public static void main(String[] args)
{
MaxHeapGeneric<Integer> test = new MaxHeapGeneric<Integer>();
test.add(5);
test.add(9);
test.add(445);
test.add(1);
test.add(534);
test.add(23);
while(!test.isEmpty())
{
System.out.println(test.remove());
}
}
}
And a min heap
import java.util.ArrayList;
public class MinHeapGeneric <K extends Comparable> {
//ArrayList to hold the heap
ArrayList<K> h = new ArrayList<K>();
public MinHeapGeneric()
{
}
public int getSize()
{
return h.size();
}
private K get(int key){
return h.get(key);
}
public void add(K key){
h.add(null);
int k = h.size() - 1;
while 开发者_JS百科(k > 0){
int parent = (k-1)/2;
K parentValue = h.get(parent);
//for minheap - if(key > parentValue)
if(key.compareTo(parentValue) > 0) break;
h.set(k, parentValue);
k = parent;
}
h.set(k, key);
}
public K getMax()
{
return h.get(0);
}
public void percolateUp(int k, K key){
if(h.isEmpty())
return ;
while(k < h.size() /2){
int child = 2*k + 1; //left child
if( child < h.size() -1 && (h.get(child).compareTo(h.get(child+1)) >= 0) )
{
child++;
}
if(key.compareTo(h.get(child)) < 0) break;
h.set(k, h.get(child));
k = child;
}
h.set(k, key);
}
public K remove()
{
K removeNode = h.get(0);
K lastNode = h.remove(h.size() - 1);
percolateUp(0, lastNode);
return removeNode;
}
public boolean isEmpty()
{
return h.isEmpty();
}
public static void main(String[] args)
{
MinHeapGeneric<Integer> test = new MinHeapGeneric<Integer>();
test.add(5);
test.add(9);
test.add(445);
test.add(1);
test.add(534);
test.add(23);
while(!test.isEmpty())
{
System.out.println(test.remove());
}
}
}
Use a min-heap (priority queue) to keep track of the top 10 items. With a binary heap, the time complexity is O(N log M), where N is the number of items and M is 10.
Compared to storing the top items in an array, this is faster for large M: array-based approach is O(NM). Ditto for linked lists.
In pseudocode:
heap = empty min-heap
for each datum d:
heap.push(d) // add the new element onto the heap
if heap.size > 10:
heap.pop() // remove the smallest element
endif
endfor
Now heap contains 10 largest items. To pop:
while heap is not empty:
item = heap.top()
print item
endwhile
精彩评论