Reading streamed data into a sorted list

2023-03-16 20:22 问答作者：

We know that, in general, the "smarter" comparison sorts on arbitrary data run in worst case complexity O(N * log(N)).

My question is what happens if we are asked not to sort a collection, but a stream of data. That is, values are given to us one by one with no indicator of what comes next (other than that the data is valid/in range). Intuitively, one might think that it is superior then to sort data as it comes in (like picking up a poker hand one开发者_如何学编程 by one) rather than gathering all of it and sorting later (sorting a poker hand after it's dealt). Is this actually the case?

Gathering and sorting would be O(N + N * log(N)) = O(N * log(N)). However if we sort it as it comes in, it is O(N * K), where K = time to find the proper index + time to insert the element. This complicates things, since the value of K now depends on our choice of data structure. An array is superior in finding the index but wastes time inserting the element. A linked list can insert more easily but cannot binary search to find the index.

Is there a complete discussion on this issue? When should we use one method or another? Might there be a desirable in-between strategy of sorting every once in a while?

Balanced tree sort has O(N log N) complexity and maintains the list in sorted order while elements are added.

Absolutely not!

Firstly, if I can sort in-streaming data, I can just accept all my data in O(N) and then stream it to myself and sort it using the quicker method. I.e. you can perform a reduction from all-data to stream, which means it cannot be faster.

Secondly, you're describing an insertion sort, which actually runs in O(N^2) time (i.e. your description of O(NK) was right, but K is not constant, rather a function of N), since it might take O(N) time to find the appropriate index. You could improve it to be a binary insertion sort, but that would run in O(NlogN) (assuming you're using a linked list, an array would still take O(N^2) even with the binary optimisation), so you haven't really saved anything.

Probably also worth mentioning the general principle; that as long as you're in the comparison model (i.e. you don't have any non-trivial and helpful information about the data which you're sorting, which is the general case) any sorting algorithm will be at best O(NlogN). I.e. the worst-case running time for a sorting algorithm in this model is omega(NlogN). That's not an hypothesis, but a theorem. So it is impossible to find anything faster (under the same assumptions).

Ok, if the timing of the stream is relatively slow, you will have a completely sorted list (minus the last element) when your last element arrives. Then, all that remains to do is a single binary search cycle, O(log n) not a complete binary sort, O(n log n). Potentially, there is a perceived performance gain, since you are getting a head-start on the other sort algorithms.

Managing, queuing, and extracting data from a stream is a completely different issue and might be counter-productive to your intentions. I would not recommend this unless you can sort the complete data set in about the same time it takes to stream one or maybe two elements (and you feel good about coding the streaming portion).

Use Heap Sort in those cases where Tree Sort will behave badly i.e. large data set since Tree sort needs additional space to store the tree structure.

继续阅读：algorithm big-o collections performance sorting

Reading streamed data into a sorted list

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？