开发者

total number of comparisons required to merge sorted files

Given 4 sorted files containing 15,3,9 and 8 records what is the total number of comparisons required to merge them into a single sorted file?

Assume that we are using the merge step (from merge sort) for this.

I kno开发者_JS百科w that the merge step takes O(N) time to execute. But how many comparisons does it make?


If you assume that you use the merge step from a typical merge sort, that means you can only merge 2 lists at a time, which makes things simpler. We need at least 3 merges to turn 4 lists into 1. We could split up lists, but that's throwing away information and we'll only have to merge them back eventually, so I doubt that helps (without proving it).

The only question, then, is what order to merge the lists. The worst-case number of comparisons to merge a total of k elements from two lists is k-1[*], so we want to minimize the total number of elements in all merges. I think (again, without proving it) that in this case this is done by merging pairwise from smallest to largest, that is 8+3 then 11+9 then 20+15. That's a total of 10+19+34 = 63 record comparisons in the worst case.

A less cunning choice of merges, say 15+3 then 8+9 then 18+17, would need more comparisons in the worst case (67), but you wouldn't need to know the lengths of the lists before you started.

[*] proof by induction:

With k=1, 0 comparisons are required since we have one empty list and one list of length 1.

Suppose it's true for lists of total length j (for some j >= 1). Then in the worst case, to merge two sorted lists of length j+1, we first compare the smallest elements on either side, remove the smaller one and shove it into the output list. All that remains is to merge what's left in the two lists, that is to say of total length j. We can do this in at worst j-1 comparisons by the inductive hypothesis. Hence total j+1 elements requires at worst j comparisons, which completes the induction.


Each step when need to know current smallest element among the 4 files. In other word, we need to know the smallest one of the four elements a, b, c, d. So a naive way would use three comparison for each step(a and b, c and d, the smaller of ab and cd). So the overall comparison would be 3*N (N is the total number of records).


As in Mu Qiao's answer (somewhat repeated here for convenience), we can merge the lists by taking the smallest value of a, b, c, d at each step. This requires at most 3 comparisons: comparing a and b, comparing c and d and comparing the minimum of {a, b} and {c, d}. We can trivially see that this is the best we can do since 2 comparisons are insufficient.

Suppose we deplete one list. Let's assume without loss of generality that this list is d. Now to compare a, b, c we can compare a and b and then compare the minimum or {a, b} to c. We can see that we cannot do better than this since 1 comparison is insufficient.

When there are two lists, we obviously need exactly 1 comparison and when there is one list we obviously need no comparisons.

From here, we can make a worst case analysis. We can see that a larger number of non-depleted lists results in more comparisons and so, we can see that the worst case will be when the most items are processed before a list is depleted.

In this case there will be at most 14+2+8+7 = 31 comparisons before a list is depleted. From there, there will be one list depleted for each item that is processed. So we have the worst-case number of comparisons as 31*3 + 2 + 1 + 0 = 96.


If we are merging two lists, merge procedure requires at most: n1 + n2 comparisons, where n1 and n2 are the length of the lists.

With 4 lists, the total number will depend on the order we use in this merging: i.e we can merge list1 and list2 and then merge the result with list3 and then with list4; or we can merge list1 and list2, and merge list3 and list4 and then merge the 2 results.

In this case, is easy to check that this is the best strategy:

(list1 <-> list2) <-> (list3 <-> list4)          (<-> stands for merge)

Which is the maximum comparisons number? It's easy recalling the starting formula:

(15 + 3) + (9 + 8) + ((15+3) + (9+8)) = 18 + 17 + 18 + 17 = 70


total number of comparisons required for two files of size m , n are

m + n - 1 

thus for above we require

15 + 3 -1 = 17 

8 + 9 -1 = 16

17 + 16 -1 = 32 


The worst case in merging of 2 sorted lists occurs in the case when both the lists remain non-null for the maximum amount of time. In this case with 15, 3, 9 and 8 elements we have 3 comparisons to find the smallest element (4 elements and selection sort takes 3 comparisons). In the worst case imagine: we have 3,3,3 and 3 elements left in each list. Till this point number of comparisons are : (12 + 0 + 6 + 5)*3 = 69. Now with remaining lists of (3,3,3,3) keys reduce them to (2,2,2,2) elements ( so 4 * 3 = 12 comparisons). Again reduce (2,2,2,2) to (1,1,1,1) using 4 * 3 = 12 comparisons. Now reduce (1,1,1,1) to (0,1,1,1) by 3 comparisons. Now reduce (0,1,1,1) to (0,0,1,1) by 2 comparisons. Now reduce (0,0,1,1) to (0,0,0,1) using 1 comparison. Now add the last element to the sorted list by 0 comparisons. Hence total comparisons: 69 + 12 + 12 + 3 + 2 + 1 + 0 = 99 comparisons.


I think we can generalize the answer approximately. Given n lists of length k1, k2, ...,kn then in Big Oh notation the worst case is bounded by O((n-1) (k1+k2+k3...+kn)). I got this idea from doing question of CLRS (Question 2-1 of chapter 2).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜