Can sort() in C++ have a n^2 performance?
When trying to estimated the performance of a program, I always treated sort() function as a worst-performance-n^2 function. However, I came across a Wikipedia page:
sort(C++)
Which states that the GNU C Library sort() uses some hybrid sorting algorithm called Introsort first, then do insertion sort. The corresponding page to Introsort claims that this algorithm has a worst case performance of nlogn. However, since I am not familiar with this algorithm, I still have the following worries about sort():
1) Can开发者_运维百科 the hybrid algorithm used by GNU sort() guarantee a O(nlogn) performance? If so, how big can the constant overhead of the nlogn be?
2) Are there any other implementations that could cause sort() to perform worse than this (or better, which would be great)?
EDIT: Reply to Kevin: The sort() mentioned is std::sort().
Thanks!
The use of quicksort and introsort (which is a variant of the former, with guaranteed O(n log n)
performance achieved by switching to heapsort on worst case inputs) in place of other theoretically better algorithms like mergesort is due to the fact that the average case is the same, and the constants much lower (in the constants you can include the fact that it can be sorted in place, so there are no reallocations, and copies). And the worst case is bad, but quite improvable. In general, it is assumed that the performance of sort
is O( n log n )
.
If you are concerned about the hidden constants, then the question is not theoretical, but rather a question of performance. When trying to optimize you are better off measuring the algorithm on your actual data, analyzing the results of the measurement, and then determining where the time is spent and if it can be improved. But that is a completely different problem from the theoretical one.
If your standard library makes no guarantees beyond ISO 14882, then there seems to be no formal bound on the worst-case behavior of sort()
— only the average complexity is listed. There's a footnote in the standard which mentions you should use stable_sort()
or partial_sort()
instead of sort()
if you care:
http://www.kuzbass.ru:8086/docs/isocpp/lib-algorithms.html#lib.alg.sorting
25.3.1.1 - sort [lib.sort]
template<class RandomAccessIterator> void sort(RandomAccessIterator first, RandomAccessIterator last) template<class RandomAccessIterator, class Compare> void sort(RandomAccessIterator first, RandomAccessIterator last, Compare comp)
Effects: Sorts the elements in the range [first, last).
Complexity: Approximately N log N (where N == last - first) comparisons on the average.*
[Footnote: If the worst case behavior is important stable_sort() (lib.stable.sort) or partial_sort() (lib.partial.sort) should be used. --- end footnote]
Specific library implementations probably make stronger guarantees beyond the standard. And it can certainly be useful to look at the code directly. Then again, it depends on how portable you want this to be.
Introsort has actually O(n log(n)) worst-case running time, not O(n^2). Also see this remark at the SGI STL Specs:
Earlier versions of sort used the quicksort algorithm, using a pivot chosen by median of three. Quicksort has O(N log(N)) average complexity, but quadratic worst-case complexity. The current implementation of sort, however, uses the introsort algorithm whose worst case complexity is O(N log(N)). Introsort is very similar to median-of-three quicksort, and is at least as fast as quicksort on average.
Yes it is a variation of quicksort, using heapsort for suspected pathological quicksort input. It looks at recursion depth and when it falls too deep it sorts using heapsort removing any pathological behavior. This guarantees N log N. The constant overhead of N log N (qsort vs heapsort) is not something to worry about.
Insertion sort is used when there are very few elements (about 16).
http://en.wikipedia.org/wiki/Sorting_algorithm lists several sorting algorithms with n^2 performance. It has one with n! performance. It also lists several non-comparison sorts which have performance based on other factors.
精彩评论