Clojure: sequence back to vector
How can I cast a sequence back to vector after a sequence producing operation (like sort)? Does using (vec..) on a sequence that was a vector is costly?
One (bad?) possibility is creating a new vector out of sequence:
(vec (sort [1 2 3 4 5 6]))
I am asking because I need random access (nth ..) to huge sorted vectors - which are now huge sequences after the sort, with horrible O开发者_运维技巧(n) random access time
Meikel Brandmeyer just posted a solution to this on the Clojure group.
(defn sorted-vec
[coll]
(let [arr (into-array coll)]
(java.util.Arrays/sort arr)
(vec arr)))
Clojure's sort
returns a seq across a sorted array; this approach does much the same thing, but returns a vector, not a seq.
If you wish, you can even skip the conversion back into a Clojure persistent data structure:
(defn sorted-arr
"Returns a *mutable* array!"
[coll]
(doto (into-array coll)]
(java.util.Arrays/sort))
but the resulting Java array (which you can treat as a Clojure collection in most cases) will be mutable. That's fine if you're not handing it off to other code, but be careful.
From my own tests (nothing scientific) you may be better with working directly on arrays in cases where you do lots of sorting. But if you sort rarely and have a lots of random access to do though, going with a vector may be a better choice as random access time is more than 40% faster on average, but the sorting performance is horrible due to converting the vector to an array and then back to a vector. Here's my findings:
(def foo (int-array (range 1000)))
(time
(dotimes [_ 10000]
(java.util.Arrays/sort foo)))
; Elapsed time: 652.185436 msecs
(time
(dotimes [_ 10000]
(nth foo (rand-int 1000))))
; Elapsed time: 7.900073 msecs
(def bar (vec (range 1000)))
(time
(dotimes [_ 10000]
(vec (sort bar))))
; Elapsed time: 2810.877103 msecs
(time
(dotimes [_ 10000]
(nth bar (rand-int 1000))))
; Elapsed time: 5.500802 msecs
P.S.: Note that the vector version doesn't actually store the sorted vector anywhere, but that shouldn't change the result considerably as you would use simple bindings in a loop for speed.
If you need to random access on the result of sort with huge vectors, then the time took by the call to vec should be far outweighed by time savings of doing so.
If you profile and find that it is too slow, you'll probably have to use java arrays.
As a new Clojure developer, it is easy to confuse collections and sequences.
This sorted vector function:
(sort [1 2 3 4 5 6]) => (1 2 3 4 5 6) ; returns a sequence
But I need a vector for the next operation because this does not work...
(take-while (partial > 3) (1 2 3 4 5 6))
=>ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval2251 (NO_SOURCE_FILE:2136)
Let us try to convert the sequence to a vector:
(vec (1 2 3 4 5 6))
=>ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval2253 (NO_SOURCE_FILE:2139)
Nope! But if you put it all together, it works just fine.
(take-while (partial > 3) (sort [1 2 3 4 5 6]))
=>(1 2)
The lesson: You cannot work with sequences directly! They are an intermediate step in the process. When the REPL tries to evaluate (1 2 3 4 5 6), it sees a a function and throws an exception:
(1 2 3 4 5 6) =>ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval2263 (NO_SOURCE_FILE:2146)
精彩评论