Getting n smallest numbers in a sequence
What would be the most efficient way to take n smallest numbers from a sequence,
[ [1 2 3] [9 2 1] [2 3 4] [5 6 7] ]
I would like to take 2 smallest from the sequence based on the first item,
[1 2 3] [2 3 4]
currently I am sorting the whole list then taking first n items but that probably is not the most efficient way to go, it is a big list an开发者_StackOverflow中文版d I need to do this frequently.
The Joy of Clojure, Chapter 6.4 describes a lazy sorting algorithm.The beauty of lazy sorting is that it will only do as much work as necessary to find the first x values. So if x << n this algorithm is O(n). Here is a modified version of that algorithm.
(defn sort-parts
[work f]
(lazy-seq
(loop [[part & parts] work]
(if-let [[pivot & xs] (seq part)]
(let [psmaller? (partial f pivot)]
(recur (list* (filter psmaller? xs)
pivot
(remove psmaller? xs)
parts)))
(when-let [[x & parts] parts]
(cons x
(sort-parts parts f)))))))
(defn qsort [xs f] (sort-parts (list xs) f))
(defn cmp [[a _ _] [b _ _]] (> a b))
(def a [[1 2 3] [9 2 1] [2 3 4] [5 6 7]])
(take 2 (qsort a cmp))
As referenced, you can use the median-of-medians algorithm to select the kth smallest element in linear time, and then partition in linear time. This will provide you with the k smallest elements in O(n). The elements will however be unsorted, so if you want the k smallest elements sorted it will cost you another O(klogk).
A few important notes:
Firstly, although the complexity is O(n) small constants are not guaranteed and you might find minimal improvement, especially if your n is reasonably small. There are random linear selection algorithms that run in better actual times (usually the expected running time is O(n) with worse worst-cases but they have smaller constants than the deterministic ones).
Why can't you maintain the array in a sorted fashion? That would probably be much more performant. You would simply need to insert each element in the correct place which costs O(logn), but finding the k smallest would then be O(1) (or O(k) if you have to build the array afresh).
If you decide against the above note, then an alternative is to keep the array sorted after every such procedure, provide insert in O(1) to the end of the array and then execute a "merge sort" every time you need to find the k smallest elements. I.e. you sort only the new ones and then merge them in in linear time. So that would cost O(mlogm + n) where m is the number of elements added since last sort.
If n is small, you can create a second list of size n that you keep sorted, so you always have quick access to the largest in that list; iterate through the big list checking if each is less than the largest in the small list; if so, insert it into the small list... the small list is full, pop off the prior oldest.
If n is less than 3 or 4, you can just brute force it. If n can be larger, you'll want to do a binary search to find the insertion point for each. If n can be very large, then a different mechanism may be in order.
精彩评论