Worst case for QuickSort - when can it occur?
When analyzing QS, every one always refers to the "almost sorted" worst case. When c开发者_JAVA技巧an such a scenario occur with natural input?
The only example I came up with is re-indexing.
I think people are confusing Quicksort the partition-based sorting algorithm, and "qsort" the various library implementations.
I prefer to see Quicksort the algorithm as having a pluggable pivot selection algorithm, which is quite essential in analyzing its behavior.
If the first element is always chosen as the pivot, then an already sorted list is the worst-case. Often there's a high probability that the array is already/nearly sorted, so this implementation is rather poor.
Analogously, selecting the last element as the pivot is bad for the same reason.
Some implementations tries to avoid this problem by choosing the middle element as the pivot. This would not perform as badly on already/nearly sorted arrays, but one could still construct an input that would exploit this predictable pivot selection and make it run in quadratic time.
Thus, you get randomized pivot selection algorithms, but even this doesn't guarantee O(N log N)
.
So other algorithms were developed that would use some information from the sequence before picking a pivot. You can of course scan the whole sequence and find the median, and use that as the pivot. This guarantees O(N log N)
, but of course slower in practice.
So some corners are cut, and people devised the median-of-3 algorithm. Of course, later even this was exploitable by the so-called median-of-3 "killer".
So more attempts are made at coming up with more "intelligent" pivot selection algorithms that guarantees O(N log N)
asymptotic behavior that is still fast enough to be practical, with varying degree of success.
So really, unless one specifies a particular implementation of Quicksort, the question of when the worst case scenario occurs is ill-defined. If you use the so-called median-of-medians pivot selection algorithm, there is no quadratic worst-case scenario.
Most library implementations, however, are likely to forfeit O(N log N)
guarantee for much faster sorting in the average case. Some of the really old implementations use the first element as the pivot, which is now well-understood as poor and is no longer a practice widely followed.
I believe that the worst case for quicksort depends on the choice of the pivot element at every step. Quicksort has its worst performance, if the pivot is likely to be either the smallest, or the largest element in the list (e.g. the first or last element of an already sorted list).
If, e.g. you choose the middle element of the list, an already sorted list does not have the worst case runtime.
So, if you suspect your scenario is likely to a bad case scenario for quicksort, you can simply change your choice of pivot element to make quicksort perform better.
Note: I know, that this did not give more example of real world occasions for quicksort worst cases. Examples of this depend on the implementation you are working with.
The actual question was: "When can such a scenario (almost sorted) occur with natural input?".
Although all the answers are dealing with "what causes worst case performance", none have covered "what causes data that meets the worst case performance scenario".
So, to answer the actual question
Programmer error: Basically you land up sorting a list twice. Typically this happens because a list is sorted one place in code. And later in another piece of code you know you need the list to be sorted, so you sort it again.
Using almost-chronological data: You have data that is generally received in chronological order, but occasionally some elements are out of position. (Consider a multi-threaded environment adding time-stamped elements to a list. Race conditions can cause elements to be added in a different order to which they were time-stamped.) In this situation, if you need sorted data, you must re-sort. Because the order of the data is not guaranteed.
Adding items to a list: If you have a sorted list and simply append some items (i.e. without using binary insertion). You would need to re-sort an almost-sorted list.
Data from an external source: If you receive data from an external source, there may be no guarantee that it's sorted. So you sort it yourself. However, if the external source is sorted, you will be re-sorting the data.
Natural ordering: This is similar to the chronoloigcal data. Basically, the natural order of the data you receive may be sorted. Consider an insurance company adding car registrations. If the authority assiging car registrations does so in a predictable order, newer cars are likely but not guaranteed to have higher registration numbers. Since you're not guaranteed it's sorted - you have to re-sort.
Interleaved data: If you receive data from multiple sorted sources with overlapping keys, you could get keys resembling the following: 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 17 16 19 18. Even though half the elements are out-of-sequence with its neighbour, the list is "almost sorted". Certainly using QuickSort that pivots on the first element would exhibit
O(n^2)
performance.
Conclusion
So, given all the above scenarios, it's actually quite easy to land up sorting almost-sorted data. And this is exactly why QuickSort that pivots on the first element is actually best avoided. polygene has provided some interesting information on alternate pivoting considerations.
As a side-note: One of the usually worst performing sorting algorithms, actually does quite well with "almost-sorted" data. In the interleaved data above, bubble-sort requires only 9 swap operations. It's performance would actually be
O(n)
.
From Quicksort
for quicksort, "worst case" corresponds to already sorted
A list with all the items the same number is already sorted.
worst case in quick sort:
- All elements of array are same
- Array is already sorted in same order
- Array is already sorted in reverse order.
Quick worst case depends on choosing pivot element . so the problem occure only when 1) Array is already sorted in same order. 2) Array is already sorted in reverse order. 3) All elements are same (special case of case 1 and 2)
精彩评论