开发者

Priority Queue with a find function - Fastest Implementation

I am looking at implementing a priority queue with an added requirement, a find/search function which will tell whether an item is anywhere within the queue. So the functions will be: insert, del-min and find.

I am unsure whether I should use a Heap or a Self-balancing binary search tree. It appears PQs are usually implemented with a Heap, but I am wondering if there is开发者_StackOverflow any advantage in using a binary search tree since I also need that find function.

Furthermore, on average I'll be doing more inserts than deletes. I am also considering a d-ary heap. Basically, every second counts.

Thanks!


Why can't you just use a Priority Queue and a Set? When you enqueue something, you add it to the set. When you dequeue it, you remove it from the set. That way the set will tell you if something is in the queue.


If your find operation is relatively infrequent (and your heap fairly small), I'd just do a linear search. If it is relatively frequent, or the heap is enormous, consider tracking heap membership (to do your 'find' test) with a separate data structure or an object flag. The joy of external indexing is being able to put your object in as many containers as you like.

If by 'find' you really mean 'find and modify' (I find I often need to delete things from priority queues independently of the typical insert/del-min), here are three approaches I've used:

Given a high rate of insert/del-min (100k/s continuous) and a low rate of find-delete (say 1/s) over a fairly small working set (500-1000) I did a linear search for the element and then deleted it from the tree in the standard way.

Given a high rate of insert/del-min plus fairly frequent find-deletes I simply marked the deleted objects as "uninteresting" after finding them indirectly. The actual free was deferred until the object was dequeued as normal.

Given a small std::priority_queue (which has no access methods outside of insert/del-min) of only a few elements and fairly infrequent deletions, I just copied the entire queue to a temporary std::vector and copied the modified/desired part back into the queue. Then I cried myself to sleep.


If you need the benefits of more than one data structure then you can use them in composition. For example, if you need the benefits of a priority queue and a binary search tree then make your desired actions on both of them.

If it's insert then insert the element to both of them.

If it's find then you can find the element using the binary search tree and if it was found then continue on to find it in the priority queue.

If it's min then remove it first from the priority queue and now that you know which element it is then you can remove it from the binary search tree.

if it's del then first find it in the binary search tree and remove it then continue to find it in the priority queue and remove it from there too.

It is assumed that the nodes of the binary tree and the nodes of the priority queue are pointers to your elements.


IIRC search/find on a heap is O(n) whereas on a tree it is O(log(n)) and the other standard PQ operations are the same.

Heaps are only empirically more efficient by some constant factor, so if its a big queue a tree should be better, if its small you need to test and profile. its all good to know in theory whats faster, but if those constant factors are large it may be completely irrelevant for sufficiently small data sets.


Radix trees with a min-heap property will provide the properties you need. This will actually give you constant time complexities for your operations. For example, if we look at this Haskell implementation, all three operations you mention have time complexity O(min(n,W)). Where n is the number of elements, and W is the number of bits in an int (32 or 64).


Please, check this code. I coded this program and this is a priority queue with your needed functions.

1. Insert
2. Find
3. Delete
4. Show

You can try it. It's working perfectly. Here I added ascending order minimum number to maximum number.

I used the priority queue default function to do that with a switch case.

queue.push()
queue.pop()
queue.top()
queue.size()

C++ code:

#include<bits/stdc++.h>
#include <queue>
using namespace std;
void show_queue(
    priority_queue<int, vector<int>, greater<int> > data)
{
    priority_queue<int, vector<int>,greater<int> > myq = data;
    while (!myq.empty()) {
        cout << '\t' << myq.top();
        myq.pop();
    }
    cout << '\n';
}

int main()
{
    priority_queue<int, vector<int>,greater<int> > myp_queue;
    while(1)
    {

    int choice;
    cout<<"\nwhat do you want to do?"<<endl;
    cout<<"1. Insert \n2. Find \n3. Delete \n4. Show Queue \n\nchoice your option from above: ";
    cin>>choice;

    switch(choice)
        {
            case 1:
                int n;
                cout<<"Enter the value: " ;
                cin>>n;// Option 2 => Insert
                myp_queue.push(n);
                break;
            case 2:
                if(!myp_queue.empty()){
                    cout<<"\n"<<myp_queue.top()<<" is the minimum number"<<endl; // Find the minimum number.
                }else{
                    cout<<"\nEmpty Priority Queue"<<endl;
                }
                break;
            case 3:
                if(!myp_queue.empty()){
                    myp_queue.pop(); //Delete the minimum number from the queue
                    cout<<"\nSuccessfully Deleted"<<endl;
                }else{
                    cout<<"\nThere is no element to delete"<<endl;
                }
                break;
            case 4:
                if(!myp_queue.empty()){
                    show_queue(myp_queue); // Show full queue
                }else{
                    cout<<"\nEmpty Priority Queue"<<endl;
                }
                break;
            default:
                cout<<"\nYou are terminated!!! \nYou entered wrong input.\n"<<endl;
        }

    }
    return 0;
}


First some background. The heap is typically implemented as an array or vector, whereby each node has an index, with 0 being the highest-priority node.

Your heap has to store various data per node: the value, a callback function or some such.

In practice moving these nodes around constantly is too expensive, so they tend to be allocated outside the heap vector, which is in fact an array of pointers to nodes.

Generally, I will have the API for the priority queue give you the pointer to the node in question, and implement a delete() method that takes the address of that node as an argument. In effect, the application asks to register an event at a priority (in my case, time), and gets an opaque "cookie." Should the app want to cancel the event, they pass that cookie back in to the heap.

OK, now to answer your question! This node (the cookie) should hold the node's index into the vector. That allows the delete() operation to go directly to that point in the vector. You can then swap the tail of the heap for that node and re-heapify up or down as needed.


Store your data in the fastest container you've tested and use a bloom filter to test if something is in the container.

I mated a bloom filter with a hash table in a previous project and it sped things up 400 times on hash tables with an average of roughly 10k items.

The bloom filter has a few interesting properties:

  • If the answer is no from a bloom filter, it's 100% reliable.
  • If the answer is yes, you have to check the other data structure to make sure the item is actually present.
  • Make sure you pick a good hash function :)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜