Merging k sorted linked lists - analysis
I am thinking about different solutions for one problem. Assume we have K sorted linked lists and we are merging them into one. All these lists together have N elements.
The well known solution is to use priority queue and pop / push first elements from every lists and I can understand why it takes O(N log K)
time.
But let's take a look at another approach. Suppose we have some MERGE_LISTS(LIST1, LIST2)
procedure, that merges two sorted lists and it would take O(T1 + T2)
time, where T1
and T2
stand for LIST1
and LIST2
sizes.
What we do now gener开发者_如何学Pythonally means pairing these lists and merging them pair-by-pair (if the number is odd, last list, for example, could be ignored at first steps). This generally means we have to make the following "tree" of merge operations:
N1, N2, N3...
stand for LIST1, LIST2, LIST3
sizes
O(N1 + N2) + O(N3 + N4) + O(N5 + N6) + ...
O(N1 + N2 + N3 + N4) + O(N5 + N6 + N7 + N8) + ...
O(N1 + N2 + N3 + N4 + .... + NK)
It looks obvious that there will be log(K)
of these rows, each of them implementing O(N)
operations, so time for MERGE(LIST1, LIST2, ... , LISTK)
operation would actually equal O(N log K)
.
My friend told me (two days ago) it would take O(K N)
time. So, the question is - did I f%ck up somewhere or is he actually wrong about this? And if I am right, why this 'divide&conquer' approach can't be used instead of priority queue approach?
If you have a small number of lists to merge, this pairwise scheme is likely to be faster than a priority queue method because you have extremely few operations per merge: basically just one compare and two pointer reassignments per item (to shift into a new singly-linked list). As you've shown, it is O(N log K)
(log K
steps handling N
items each).
But the best priority queue algorithms are, I believe, O(sqrt(log K))
or O(log log U)
for insert and remove (where U
is the number of possible different priorities)--if you can prioritize with a value instead of having to use a compare--so if you are merging items that can be given e.g. integer priorities, and K
is large, then you're better off with a priority queue.
From your description, it does sound like your process is indeed O(N log K). It also will work, so you can use it.
I personally would use the first version with a priority queue, since I suspect it will be faster. It's not faster in the coarse big-O sense, but I think if you actually work out the number of comparisons and stores taken by both, the second version will take several times more work.
This is O(2*log(K)*N)
this is O(N*log(K))
and you can't have worst complexity because you only 2N
times add to priority queue in O(log(K))
.
Or you can push all elements into vector in O(2N)
. And sort it in O(2n*log(2n))
. Then you have O(2N+2N*Log(2N))
, this is O(N*LOG(N))
, exacly your K = N
;
It runs indeed in O(N*log K)
, but don't forget, that O(N*log K)
is a subset of O(N*K)
. I.e. your friend is not wrong either.
精彩评论