Efficient priority list

2023-01-07 11:41 问答作者：

I am looking f开发者_StackOverflow社区or an efficient data structure to represent a priority list. Specifically I need to assign a priority to a set of items and return only the top scoring items. I have looked into priority queues which operate on heaps, but they don't seem to really suit my needs. They will reorganize the heap structure as soon as I will poll the top rating item from the queue.

The simplest solution would of course be a linked list, which in the worst case would take quite long for the insertion operation.

Does anyone have a better solution?

Heaps seem very suitable, and it seems like you are going about it wrongly.

Say you wanted the top x elements (how does this x compare to n, btw?)

What you are doing is putting all into a max-heap and getting the top x.

I suggest instead, you use a min-heap of exactly x elements.

First x elements you insert into heap.

Next incoming element, you compare against the min which can be done very quickly (O(1) time) in the heap. If smaller, you just ignore the incoming element.

If incoming element is larger than min, then you increase the min to the incoming element and sift it down in the heap. This should be logx time at worst.

Once done (in nlogx time), you can retrieve the elements from the heap in sorted order in O(xlogx) time.

Depending on how your data is (and how small x is), using this min-heap solution can be really fast.

If you really really want the inserts to be super-fast and don't care much about the retrieval then you can also do the following.

Insert the elements into a vector (array with amortized O(1) insert time) in the order they come.

The use the Selection algorithm to find the xth largest element (in O(n) time, but the constants might be big). Say that number is S.

Now walk the array comparing each element with S and select the ones as large as S.

If x is reasonably sized and comparable to n (like n/2 or something) this might work out fine, but if x is small compared to n, I would suggest go with the min-heap.

Hmm. Skip lists? They should have O(log n) insertion (as heap-based queue) but getting top element should be O(1) [including removing it]. They could be even implemented using lock-free algorithm.

If you need only the k top items and you never need to look a the others, you can use a simple linked list or array storing only the current top k items, plus a number (the worst score of the elements in the list).

In the Add() operation you simply compare the item with the worst value in the list and, if better, you swap the current worst with the added item. This takes O(k) time in the worst case for insertion because you need to find the element that has currently the worst score. The the average case, however, is O(1), since, as you add better elements to the list, the probability of having to do a swap tends to 0 (that is, you're not actually adding any items).

So if you generate elements at random, your performance is likely to be very good. Even if you generate ordered items (worst case), it might be fast enough for your value of k.

The JDK has a built-in pqueue class (java.util.PriorityQueue) which is based on a heap algorithm.

Sorry, I only just saw the bit about heaps not fitting your needs. Can you explain why? You can write a custom comparator (or make your items comparable) and the PriorityQueue will order your items appropriately.

A balanced tree would always guarantee a logarithmic worst case. Although linear time is usually regarded as feasible, there is still a tremendous difference between logarithmic and linear:

for a billion elements, the difference is between 1 billion operations and a few dozens. If each operation takes 1 millisecond, that means going from 11 days to less than a second.

Every node has at most two children.
The heap tree is complete and left-adjusted. Complete means that if the heap has height H, every leaf node is either at level H or H-1. All the levels are left-adjusted, which means that no right sub-tree has a height greater than its left sibling. So, if a leaf is at the same height as an internal node, the leaf can’t be on the left of that node.
Every node holds the highest priority in the subtree rooted at that node.

Efficient priority list

Binary search trees are the most common kind of trees, but we can use d'ary trees. we can use any value greater than 2, and use the same array representation for the heap.

Efficient priority list

But the improvement we get with trees comes with a price. First, as with any data structure that uses pointers (lists, graphs, trees, and so on) we have a memory overhead in comparison to arrays. While with the latter we just need to reserve space for the data (plus maybe, depending on the implementation details, some constant space for pointers and the node structure itself), every tree node requires extra space for the pointers to its children and possibly to its parent.

Reference

继续阅读：algorithm priority-queue

Efficient priority list

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？