How can I randomly select an element in Ocaml?
In my program in OCaml, I need to randomly select a string from a huge set of strings. I've tried two different ways of doing this so far, each with little success. I first stored all of the strings into a list and then would randomly select an element from the list:
let randomelement l =
List.nth l (Random.int (List.length l开发者_高级运维))
But this takes a long time if it selects the 1000th string in the list. So I put it all into a set, thinking that Set.choose
would return a random element from the set. But that doesn't seem to be working. I guess I have two questions...how does Set.choose
work, and is there a better way to randomly select an element in Ocaml?
If you care about selection speed you should use a different container. Why use List with access in O(n) or Set with O(log n) when you can use Array with O(1) i.e. constant time.
To adjust your example:
let randomelement arr =
let n = Random.int (Array.length arr) in
Array.get arr n;;
Set.choose
is an alias for Set.min_elt
; although it may not be in the future.
List.nth
would surely suck if you had to do it very often.
An array will work great, but if you have to add more elements, or delete elements, this could be a book-keeping nightmare.
Take a look at implementations of Random Access Lists, which have pretty optimal time for insertion, deletion, lookup, and cardinal without being constrained to arrays.
When I originally ran into this problem, I extended the implementation of Set
and Map
to include random
and nth
. To extend a module, you need to re-implement the structure and add identity functions to convert between the two. I wrote a new module, called XSet
, with the following boilerplate:
module Make (Ord : Set.OrderedType) =
struct
include Set.Make(Ord)
type impl = Empty | Node of impl * elt * impl * int
external impl_of_t : t -> impl = "%identity"
external t_of_impl : impl -> t = "%identity"
...
end
You will have to use impl_of_t
and vice versa to call normal Set
functions, or call your personal ones from the passed arguments --what gets passed to your function should be Set.Make
's implementation t
not impl
.
No, Set.choose is deterministic, and I think it's specified in the documentation. I think in the current implementation it returns the minimum element.
In which data structure is your set of string stored in the first place ? An easy way to get a random element is to count the number of different strings you have, pick a random number K in between, and get the "K-th string" you have. This can be done rather efficiently for some data structure (eg for arrays, it's a constant time operation), less efficiently for others.
Your question implies that you have all the time you want to prepare the structure that contains the strings, and that then, you will pick several times at random without changing the set.
If this is correct, then I would recommend to store them in an array. This gives the fastest access to a random element (direct once you have picked a random index). Sets and list implementation are all inconvenient to access the i-th element once you have picked a random index i (although, for Sets, if you do not mind a random choice with poor randomness, you could pick a random path in the binary tree that represents the set. You can't do that from outside the implementation but you could copy-paste it and add your function inside the module).
First off, there's a List.nth function which could replace your helper function.
let n = Random.int (List.length lst) in
List.nth n lst ;;
(though this would not be any faster since I'm pretty sure it does just about the same thing as your helper function does)
As far as speeding this up goes: if the number of strings you have is fixed, you could use an Array which would speed things up quite a lot as the access time for an Array is constant.
Set.choose
guarantees that it will choose the same element every time for a given set. Which element it chooses is not specified, but if you look at the source code for it you'll see that it returns the minimum element.
Your best bet is to use an array. If you want an immutable data structure instead, then I would suggest a map keyed on integers.
精彩评论