Profiling sorting algorithms against partially sorted data
We know that several sorts, such as insertion sort, are great on arrays that are 'mostly-sorted' and not so great on random data.
Suppose we wanted to profile the performance improvement/degradation of such an algorithm relative to how 'sorted' 开发者_高级运维the input data is. What would be a good way to generate an 'increasingly sorted' or 'increasingly random' array of elements? How might we measure the 'sortedness' of the input?
Number of Inversion is a usual measure of how much sorted an array is.
A pair of elements (pi,pj)
in permutation p is called an inversion in a permutation if i<j
and pi >pj
. For example, in the permutation (3,1,2,5,4)
contains the 3 inversions (3,1)
, (3,2)
and (5,4)
.
A sorted array got 0 inversion and reverse sorted array got n*(n-1)/2.
You could generate a "partially sorted" dataset by interrupting a modern Fisher-Yates shuffle run on an already ordered dataset.
Also, if you only need a few essentially fixed sets of partially sorted data, then you could generate a column graph of position vs value for each and just eye-ball them. That would let you quickly see the general random-ness of a set, as well things like how much localised order there is.
Also look into creating a binary heap, and then using the array representation as your starting point. A binary heap implemented in an array is not sorted, but it is ordered. I think it would be considered "partially sorted."
精彩评论