Can a List.Sort using a randomized Comparison delegate run infinitely?
I'm investigating and performance testing various ways of randomizing ordered collections,开发者_StackOverflow中文版 and I was looking at the option of passing a Comparison delegate that just randomly returns the comparison result. For example:
int RandomComparison<T> (T x, T y)
{
return this.random.Next (-1, 2);
}
However, as I do not know the sorting algorithm used under the hood, I do not know if this could possibly lead to the sort logic never completing. While this method seems generally unreliable performance-wise, I'm wondering if it is actually dangerously unusable?
Spot the defect: Bad comparisons, part four
I'm guessing that your suspicions are correct. The whole premise on which this type of code is built is that if A < B and B < C then A < C.
You could associate a random number with each item in the collection, then do a standard sort on that value.
List.Sort
in fact is documented to use QuickSort (no additional details given), but I'll ignore that in favour of talking about sorting in general...
I suspect that for any sensible sort algorithm, this comparator results in the operation terminating with probability 1, in the sense that the probability of it lasting N steps tends to 0 as N tends to infinity. In fact I think in a lot of cases there's a hard upper limit.
The reason is that it's possible (occurs with non-zero probability) that the random comparator could just so happen to return consistent results for enough comparisons in a row (O(n log n) of them, perhaps, or O(n^2) for insertion sort) that the algorithm "thinks" it's finished. As long as this happens eventually, I'd kind of expect it to terminate the sort.
However, I can't be certain of this, because it's by no means impossible that the algorithm has got into an unrecoverable state before this brief period of consistency. I just have a hunch that for practical sort algorithms, that won't happen. And indeed for a lot of algorithms the problem will inexorably get smaller regardless of how nonsensical the comparator is, hence yielding a hard limit. QuickSort in particular works by repeatedly partitioning the array - if the comparator is random then this will result in nonsensical partitions, but as long as no actual errors occur due to the inconsistency, the array will still be divided into two parts for recursion. The fact that the elements in the "top" part won't compare greater than the pivot on a second time of asking quite likely doesn't matter at all, since they'll never be compared with it again.
Anyway, it might take a very long time, long enough to constitute "dangerous" for most practical purposes. A bubble sort, for example, would just keep stirring things around until it gets n
negative results in a row to complete a sweep without moving anything. That would take expected time 2^n
or thereabouts (not that List.Sort
in particular could be a bubble sort, I use it only to illustrate what might happen for sorting in general). And depending on implementation details, you might find that you access out of bounds, due to the "impossible" happening, more times than you "finish".
The operation certainly doesn't necessarily randomize the collection with all permutations equally likely. Again, note that QuickSort chooses a pivot and then partitions around it. Given this random comparator (and, again, assuming no actual errors like out-of-bounds access), the probability that the first-chosen pivot ends up at the far right hand side of the array is 1 in 2^(n-1) (since all other elements must be sorted left of it, a half chance for each), whereas in a uniformly-randomly-selected permutation, that probability should be 1 in n. The first chosen pivot is distributed in the array on a bell curve, when it "should" be uniform.
Yes, using a random comparison could make the sorting algorithm act up in several different ways. It could produce a biased result, or even get stuck.
The most efficient algorithm for shuffling a collection is the Fisher-Yates shuffle. It will shuffle the collection in O(n), which is better than any sorting algorithm can offer.
As many already mentioned, sorting with a random comparison function does not produce a uniform distribution of permutations and should not be used. But to answer the question anyway...
From a theorectical point of view, yes you can hang infinitely depending on your sorting algorithm choice.
For example, this bubble sort keeps going until no swaps are made:
From Wikipedia:
procedure bubbleSort( A : list of sortable items )
do
swapped = false
for each i in 1 to length(A) - 1 inclusive do:
if A[i-1] > A[i] then
swap( A[i-1], A[i] )
swapped = true
end if
end for
while swapped
end procedure
So a malicious comparison function can make just two values be swapped in and out infinitely.
However, all decent algorithms I can think of (quicksort, mergesort, heapsort, insertionsort) only use for
loops so their terminations shouldn't depend on the correctness of the sorting routine. Even bubble sort itself can be made to guarantee termination with just a couple of changes (check the wiki link for examples)
So, from a practical standpoint, no: The correctness of the comparison function should not influence the runtime behaviour of the sort*
*modulo things like quicksort being O(n^2) on the worst case...
Once you've passed a sort routine a function that does not give a consistent order, there is no guarantee what will happen. One would hope that C# does nothing bad, but I know from experience that C++ has a disturbing tendency to dump core in that situation.
精彩评论