Understanding merge sort optimization: avoiding copies
I have below merge sort program in algorithms book,开发者_如何学编程 it is mentioned that The main problem is that merging two sorted lists requires linear extra memory, and the additional work spent copying to the temporary array and back, throughout the algorithm, has the effect of slowing down the sort considerably. This copying can be avoided by judiciously switching the roles of "a" and "tmp_array" at alternate levels of the recursion.
My question is what does author mean "copying can be avoided by judiciously switching the roles of a and tmp_array at alternate levels of the recursion" and how it is possible in following code? Request to show an example how we can achieve this?
void mergesort( input_type a[], unsigned int n ) {
input_type *tmp_array;
tmp_array = (input_type *) malloc( (n+1) * sizeof (input_type) );
m_sort( a, tmp_array, 1, n );
free( tmp_array );
}
void m_sort( input_type a[], input_type tmp_array[ ], int left, int right ) {
int center;
if( left < right ) {
center = (left + right) / 2;
m_sort( a, tmp_array, left, center );
m_sort( a, tmp_array, center+1, right );
merge( a, tmp_array, left, center+1, right );
}
}
void merge( input_type a[ ], input_type tmp_array[ ], int l_pos, int r_pos, int right_end ) {
int i, left_end, num_elements, tmp_pos;
left_end = r_pos - 1;
tmp_pos = l_pos;
num_elements = right_end - l_pos + 1;
/* main loop */
while( ( 1_pos <= left_end ) && ( r_pos <= right_end ) )
if( a[1_pos] <= a[r_pos] )
tmp_array[tmp_pos++] = a[l_pos++];
else
tmp_array[tmp_pos++] = a[r_pos++];
while( l_pos <= left_end ) /* copy rest of first half */
tmp_array[tmp_pos++] = a[l_pos++];
while( r_pos <= right_end ) /* copy rest of second half */
tmp_array[tmp_pos++] = a[r_pos++];
/* copy tmp_array back */
for(i=1; i <= num_elements; i++, right_end-- )
a[right_end] = tmp_array[right_end];
}
I'm going to assume that, without looking at this code, it is performing merge sort by declaring a contiguous block of memory the same size as the original.
So normally merge sort is like this:
- split array in half
- sort half-arrays by recursively invoking MergeSort on them
- merge half-arrays back
I'm assuming it's recursive, so no copies will be done before we're sorting sub-arrays of size 2. Now what happens?
_ means it is memory we have available, but we don't care about the data in it
original:
8 5 2 3 1 7 4 6
_ _ _ _ _ _ _ _
Begin recursive calls:
recursive call 1:
(8 5 2 3) (1 7 4 6)
_ _ _ _ _ _ _ _
recursive call 2:
((8 5) (2 3)) ((1 7) (4 6))
_ _ _ _ _ _ _ _
recursive call 3:
(((8) (5))((2) (3)))(((1) (7))((4) (6)))
_ _ _ _ _ _ _ _
Recursive calls resolving with merging, PLUS COPYING (uses more memory, or alternatively is 'slower'):
merge for call 3, using temp space:
(((8) (5))((2) (3)))(((1) (7))((4) (6))) --\ perform merge
(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 )) <--/ operation
UNNECESSARY: copy back:
(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 )) <--\ copy and
_ _ _ _ _ _ _ _ --/ ignore old
merge for call 2, using temp space:
(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 )) --\ perform merge
( 2 3 5 8 )( 1 4 6 7 ) <--/ operation
UNNECESSARY: copy back:
( 2 3 5 8 )( 1 4 6 7 ) <--\ copy and
_ _ _ _ _ _ _ _ --/ ignore old
merge for call 1, using temp space:
( 2 3 5 8 )( 1 4 6 7 ) --\ perform merge
1 2 3 4 5 6 7 8 <--/ operation
UNNECESSARY: copy back:
1 2 3 4 5 6 7 8 <--\ copy and
_ _ _ _ _ _ _ _ --/ ignore old
What the author is suggesting Recursive calls resolving with merging, WITHOUT COPYING (uses less memory):
merge for call 3, using temp space:
(((8) (5))((2) (3)))(((1) (7))((4) (6))) --\ perform merge
(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 )) <--/ operation
merge for call 2, using old array as temp space:
( 2 3 5 8 )( 1 4 6 7 ) <--\ perform merge
(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 )) --/ operation (backwards)
merge for call 1, using temp space:
( 2 3 5 8 )( 1 4 6 7 ) --\ perform merge
1 2 3 4 5 6 7 8 <--/ operation
There you go: you don't need to do copies as long as you perform each "level" of the merge-sort tree in lock-step, as shown above.
You may have a minor issue of parity, also as demonstrated above. That is, your result may be in your temp_array
. You either have three options for dealing with this:
- returning the
temp_array
as the answer, and release the old memory (if your application is fine with that) - perform a single array copy operation, to copy
temp_array
back into your original array - allow yourself to consume a mere twice-as-much memory, and perform a single cycle of merges from
temp_array1
totemp_array2
then back tooriginal_array
, then releasetemp_array2
. The parity issue should be resolved.
This is not necessarily "faster":
additional work spent copying to the temporary array and back
This is actually not the core reason why it's 'faster' per se. It is obviously not asymptotically faster, nor necessarily even faster. There is a notion of latency vs. throughput. Generally running time is measured in latency, because extra garbage work (like releasing memory) may be done asynchronously. You don't necessarily need to copy "back" to the original array depending on your language. However, if you are repeating something many times on memory-bound hardware in a garbage-collected language, the garbage collection can occasionally be forced to spike if the GC algorithm is a poor choice for what you are doing (or if this is C, maybe you are waiting to allocate). Thus if you were to create extra memory in a GC language, it should not really count against you. Granted, this may cause you not to take advantage of cache properly if you use too much memory. You'd have to benchmark it yourself, very carefully for your use case.
I do not recommend creating random temporary arrays for each step though, as that would make memory O(N log(N)) and this is a trivial optimization.
Minor notes on in-placeness:
Also, the reason you can't naively do it in-place is because while you are merging two sorted sub-arrays, the new result sorted sub-array may take arbitrarily many from one input array before spontaneously swap to the other array. For example, as you can see we need a buffer because our input arrays might get split into fragments:
( 4 6 7 8 10)(1 2 3 5 9 11)(... other sub-arrays)
( 1)(6 7 8 10)(4)(2 3 5 9 11)(...
( 1 2)(7 8 10)(4 6)(3 5 9 11) ...
( 1 2 3)(8 10)(4 6 7)(5 9 11)
( 1 2 3 4(10)(8)(6 7)(5 9 11) ooph :-(
( 1 2 3 4 5)(8)(6 7)(10)(9 11) ooph
You might be able to so cleverly in-place if you do some weird variant of the kth-statistic median-of-medians algorithm, performing your merge into the middle of the two arrays rather than the start (merging from a specifically chosen element outwards left/decreasing and right/increasing simultaneously). I'm not sure how one would implement that though, or if the hunch is true.
(very minor note: Perhaps those who are familiar with sorting algorithms should be careful of comparing a traditional swap traditional swap
operation involving a tmp
variable in a register, which is two reads-from-cache and two writes-to-cache, to not-in-place copying to other bits of memory, without a per-operation counting argument.)
Certainly, OP's method is extremely simple to code for only twice as much memory.
Start by thinking of merge sort in this way.
0: Consider the input array A0 as a collection of ordered sequences of length 1.
1: Merge each consecutive pair of sequences from A0, constructing a new temporary array A1.
2: Merge each consecutive pair of sequences from A1, constructing a new temporary array A2.
...
Finish when the last iteration results in a single sequence.
Now, you can obviously get away with just a single temporary array by doing this:
0: Consider the input array A0 as a collection of ordered sequences of length 1.
1: Merge each consecutive pair of sequences from A0, constructing a new temporary array A1.
2: Merge each consecutive pair of sequences from A1, overwriting A0 with the result.
3: Merge each consecutive pair of sequences from A0, overwriting A1 with the result.
...
Finish when the last iteration results in a single sequence.
Of course, you can be even smarter than this. If you want to be nicer to the cache, you might decide to sort top-down, rather than bottom-up. In this case, it hopefully becomes obvious what your textbook means when it refers to tracking the role of the arrays at different levels of recursion.
Hope this helps.
Here is my implementation without extra copies.
public static void sort(ArrayList<Integer> input) {
mergeSort(input, 0, input.size() - 1);
}
/**
* Sorts input and returns inversions number
* using classical divide and conquer approach
*
* @param input Input array
* @param start Start index
* @param end End index
* @return int
*/
private static long mergeSort(ArrayList<Integer> input, int start, int end) {
if (end - start < 1) {
return 0;
}
long inversionsNumber = 0;
// 1. divide input into subtasks
int pivot = start + (int) Math.ceil((end - start) / 2);
if (end - start > 1) {
inversionsNumber += mergeSort(input, start, pivot);
inversionsNumber += mergeSort(input, pivot + 1, end);
}
// 2. Merge the results
int offset = 0;
int leftIndex = start;
int rightIndex = pivot + 1;
while (leftIndex <= pivot && rightIndex <= end) {
if (input.get(leftIndex + offset) <= input.get(rightIndex)) {
if (leftIndex < pivot) {
leftIndex++;
} else {
rightIndex++;
}
continue;
}
moveElement(input, rightIndex, leftIndex + offset);
inversionsNumber += rightIndex - leftIndex - offset;
rightIndex++;
offset++;
}
return inversionsNumber;
}
private static void moveElement(ArrayList<Integer> input, int from, int to) {
assert 0 <= to;
assert to < from;
assert from < input.size();
int temp = input.get(from);
for (int i = from; i > to; i--) {
input.set(i, input.get(i - 1));
}
input.set(to, temp);
}
Look at the very last part of the merge function. What if, instead of copying that data, you just used the knowledge that the sorted part is now in tmp_array
instead of a
when the function returns, and a
is available for use as a temp.
Details are left as an exercise for the reader.
精彩评论