Find the largest 10% of numbers in an array in order
Given an array with 'N' numbers (N >100). How could we find the largest 10% of them in order ? (if n/10 is not an integer, we can round it)
I came up with 3 algorithms to attempt the above problem, but I am not sure which one is the best in terms of asymptotic running time. Could I actually make any modification to reduce the asymptotic time? Also, if N gets really large, which algorithm might still be efficient one?
I'm listing my ideas for the algorithms below and could really use some help to figure out the most efficient algorithm for this.
Algo-1
I used selection sort and stopped it once the 10% of numbers were sorted.
Algo-2
I built a max-heap and kept removing the largest 10% of numbers
Algo-3
Haven't implemented this one, but the idea I have is to use any order-statistic algorithm to find a partition containing the top 10% of numbe开发者_StackOverflowrs and then sort them using merge sort.
The quickest solution is to use the partition-based selection algorithm, which runs in O(n)
. It's based of the quicksort idea, except instead of sorting both partitions recursively, you only go to one of the partitions to find the k-th
smallest element.
Finding the largest 10% is accomplished by searching for the k=(90%*N)-th
smallest number.
If you recall how partitioning in quicksort works, elements less than the pivot is moved to the left, and the rest of the elements go to the right. Let's say you want to select the k-th
smallest element. Then you see if there are at least k
elements to the left of the pivot. If there are, then you know you can ignore the elements in the right partition. Otherwise, you can ignore all elements in the left partition, because you know that element will be in the right partition.
Note that the selection algorithm only identifies what those top 10% numbers are. If you need them to be sorted, then you have to sort those numbers (but only those numbers, the other 90% can be ignored).
Algo-1: Selection sort will run in O(n^2). First scan you do (n-1) compares, the second time (n-2), the n/10 time (n-n/10), so (n-1) + (n-2) + ... + (n-n/10) => O(n^2)
Algo-2: Deleting the max element from a heap is O(log n), so this one will run O(n log n) since you want to delete n/10 elements.
Another possible algorithm, though still O(n log n), but I think might be better than Algo-2 is to use the following quick-sort inspired procedure.
- Pick a pivot point
- Scan the all elements and put them in one of 2 buckets: those less than the pivot (left bucket) and those greater than the pivot (right bucket) (n-1) comparisons. Follow the quick sort procedure of in-place swapping.
a. Size of bucket on right == n/10: You are done.
b. Size of bucket on right > n/10 then the new list is the bucket on the right, recursively go to step 1 with the new list.
c. Size of bucket on right < n/10 then new list is bucket on the left, but you only want to find the biggest n-n/10-(size of right bucket). Recursively go to step 1 with the new list.
I would use quicksort descending on the array and get the first N/10 items.
Construct a heap with O(lnN) replacement cost filled with the first n/10 elements. Scan the remaining numbers comparing with the least value in the heap. If the current element's value is higher that the least element in the heap, insert it into the heap and remove the least element. At worst, two O(lnN) operations times N scanned items gives O(N ln N), which isn't any better in time than a sort, but requires less storage than sorting everything, as so in practice is likely to be faster (especially if N elements do not fit in cache but n/10 will - asymptotic time only matters one you are in a flat space).
The most efficient algorithm would be to use a modified quicksort.
Quicksort starts by choosing a "middle" value and putting all values lower than this to the left, and all greater to the right. Normally you would go down and recursively sort both sides, but you only need to sort the right side if there are less than 10% of the elements on the left.
If there are more than 10%, then you only need to sort the left side, and probably only some of the left side.
This won't reduce the complexity below the optimal O(N lg N), but it will reduce the constant factor and make it faster than the obvious "quicksort then choose the first 10" approach.
Very goofy question, just sort it with any sort-algorithm, and take the first N/10 items.
Algo-2 is equivalent to doing this with heap-sort
because this is homework, my answer would be any sorting algorithm, this is because you cannot solve this under O(n*log(n)).
if this was possible then you could fully sort an array under O(n*log(n)). (by finding the sorted top 10% in the array you want to fully sort, removing them and repeating this process 10 times).
because sorting isn't possible under O(n*log(n)), so is this problem.
If you know N, just create an array with a length of 1/10th of that. initial value for each cell is Int.MinValue. Examine each number in the array. If its larger than the smallest number in the ten percent array, add it.
Avoids a sort but at the expense ofconstant scans of the answer array. You can offset this somewhat by keeping it in sorted order so yo ucanuse a binary search.
精彩评论