Fastest way to sort in Python
What is the fastest way to 开发者_运维技巧sort an array of whole integers bigger than 0 and less than 100000 in Python? But not using the built in functions like sort.
Im looking at the possibility to combine 2 sport functions depending on input size.
If you are interested in asymptotic time, then counting sort or radix sort provide good performance.
However, if you are interested in wall clock time you will need to compare performance between different algorithms using your particular data sets, as different algorithms perform differently with different datasets. In that case, its always worth trying quicksort:
def qsort(inlist):
if inlist == []:
return []
else:
pivot = inlist[0]
lesser = qsort([x for x in inlist[1:] if x < pivot])
greater = qsort([x for x in inlist[1:] if x >= pivot])
return lesser + [pivot] + greater
Source: http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Python
Since you know the range of numbers, you can use Counting Sort which will be linear in time.
Radix sort theoretically runs in linear time (sort time grows roughly in direct proportion to array size ), but in practice Quicksort is probably more suited, unless you're sorting absolutely massive arrays.
If you want to make quicksort a bit faster, you can use insertion sort] when the array size becomes small.
It would probably be helpful to understand the concepts of algorithmic complexity and Big-O notation too.
Early versions of Python used a hybrid of samplesort
(a variant of quicksort with large sample size) and binary insertion sort as the built-in sorting algorithm. This proved to be somewhat unstable. S0, from python 2.3 onward uses adaptive mergesort
algorithm.
Order of mergesort (average) = O(nlogn)
.
Order of mergesort (worst) = O(nlogn)
.
But Order of quick sort (worst) = n*2
if you uses list=[ .............. ]
list.sort()
uses mergesort algorithm.
For comparison between sorting algorithm you can read wiki
For detail comparison comp
I might be a little late to the show, but there's an interesting article that compares different sorts at https://www.linkedin.com/pulse/sorting-efficiently-python-lakshmi-prakash
One of the main takeaways is that while the default sort does great we can do a little better with a compiled version of quicksort. This requires the Numba package.
Here's a link to the Github repo: https://github.com/lprakash/Sorting-Algorithms/blob/master/sorts.ipynb
We can use count sort using a dictionary to minimize the additional space usage, and keep the running time low as well. The count sort is much slower for small sizes of the input array because of the python vs C implementation overhead. The count sort starts to overtake the regular sort when the size of the array (COUNT) is about 1 million.
If you really want huge speedups for smaller size inputs, implement the count sort in C and call it from Python.
(Fixed a bug which Aaron (+1) helped catch ...) The python only implementation below compares the 2 approaches...
import random
import time
COUNT = 3000000
array = [random.randint(1,100000) for i in range(COUNT)]
random.shuffle(array)
array1 = array[:]
start = time.time()
array1.sort()
end = time.time()
time1 = (end-start)
print 'Time to sort = ', time1*1000, 'ms'
array2 = array[:]
start = time.time()
ardict = {}
for a in array2:
try:
ardict[a] += 1
except:
ardict[a] = 1
indx = 0
for a in sorted(ardict.keys()):
b = ardict[a]
array2[indx:indx+b] = [a for i in xrange(b)]
indx += b
end = time.time()
time2 = (end-start)
print 'Time to count sort = ', time2*1000, 'ms'
print 'Ratio =', time2/time1
The built in functions are best, but since you can't use them have a look at this:
http://en.wikipedia.org/wiki/Quicksort
def sort(l):
p = 0
while(p<len(l)-1):
if(l[p]>l[p+1]):
l[p],l[p+1] = l[p+1],l[p]
if(not(p==0)):
p = p-1
else:
p += 1
return l
this is a algorithm that I created but is really fast. just do sort(l) l being the list that you want to sort.
@fmark Some benchmarking of a python merge-sort implementation I wrote against python quicksorts from http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Python and from top answer.
- Size of the list and size of numbers in list irrelevant
merge sort wins, however it uses builtin int() to floor
import numpy as np
x = list(np.random.rand(100))
# TEST 1, merge_sort
def merge(l, p, q, r):
n1 = q - p + 1
n2 = r - q
left = l[p : p + n1]
right = l[q + 1 : q + 1 + n2]
i = 0
j = 0
k = p
while k < r + 1:
if i == n1:
l[k] = right[j]
j += 1
elif j == n2:
l[k] = left[i]
i += 1
elif left[i] <= right[j]:
l[k] = left[i]
i += 1
else:
l[k] = right[j]
j += 1
k += 1
def _merge_sort(l, p, r):
if p < r:
q = int((p + r)/2)
_merge_sort(l, p, q)
_merge_sort(l, q+1, r)
merge(l, p, q, r)
def merge_sort(l):
_merge_sort(l, 0, len(l)-1)
# TEST 2
def quicksort(array):
_quicksort(array, 0, len(array) - 1)
def _quicksort(array, start, stop):
if stop - start > 0:
pivot, left, right = array[start], start, stop
while left <= right:
while array[left] < pivot:
left += 1
while array[right] > pivot:
right -= 1
if left <= right:
array[left], array[right] = array[right], array[left]
left += 1
right -= 1
_quicksort(array, start, right)
_quicksort(array, left, stop)
# TEST 3
def qsort(inlist):
if inlist == []:
return []
else:
pivot = inlist[0]
lesser = qsort([x for x in inlist[1:] if x < pivot])
greater = qsort([x for x in inlist[1:] if x >= pivot])
return lesser + [pivot] + greater
def test1():
merge_sort(x)
def test2():
quicksort(x)
def test3():
qsort(x)
if __name__ == '__main__':
import timeit
print('merge_sort:', timeit.timeit("test1()", setup="from __main__ import test1, x;", number=10000))
print('quicksort:', timeit.timeit("test2()", setup="from __main__ import test2, x;", number=10000))
print('qsort:', timeit.timeit("test3()", setup="from __main__ import test3, x;", number=10000))
Bucket sort with bucket size = 1. Memory is O(m) where m = the range of values being sorted. Running time is O(n) where n = the number of items being sorted. When the integer type used to record counts is bounded, this approach will fail if any value appears more than MAXINT times.
def sort(items):
seen = [0] * 100000
for item in items:
seen[item] += 1
index = 0
for value, count in enumerate(seen):
for _ in range(count):
items[index] = value
index += 1
精彩评论