What's the best technique to generate a random-access data structure lazily?
In Haskell, I'd like to generate a list of random integers of undetermined length. (However, less than say 1 million.)
I'm not likely to need all the elements of the list immediately, so I'd like to generate it lazily. However, once generated, I'll need to access el开发者_C百科ements in the list with random access. So, I assume the best technique would be to copy an infinite list to an array. However, I don't know if arrays can be "interchanged" with lists very easily -- for instance, once I want to generate more items of the list, I want to start where I left off but expand the array.
Anyways, perhaps I should settle for some log(n) tree structure instead of an array? What do you think?
Does Haskell have a tree structure for sequential elements that can be accessed using a list-like API?
If you have a good PRNG, then values which are generated with seeds which are close together should be independent, so you can just choose an initial seed, then each cell i
in the array will be prng(seed+i)
.
This is essentially just using it as a hash function :P
If you do it this way, you don't even really need the array, you can just have a function getRandomValue seed i = prng (seed+i)
. Whether or not this is better than an array of lazy values will depend on the complexity of your PRNG, but either way you'll get O(1) access.
This seems like it might be useful, but you would need to generate increasingly larger trees as the list gets longer: http://www.codeproject.com/KB/recipes/persistentdatastructures.aspx#RandomAccessLists (it is a log(n) structure, though). Something like http://research.janelia.org/myers/Papers/applicative.stack.pdf might be useful as well, but I don't see off hand how to make it work efficiently (with shared structure) when the list is generated dynamically.
One possible approach would be to use a pseudorandom number generator that allows "leapfrogging," or skipping forward n
steps in sub-linear time. One method (that is slow, but works in constant time) is to use a block cipher in counter mode; see http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator#Designs_based_on_cryptographic_primitives. See http://cs.berkeley.edu/~mhoemmen/cs194/Tutorials/prng.pdf for more information.
Not an answer, but a consideration: note that any standard technique for generating a list of random values will involve threading through a seed. So if I generate a lazy random infinite list and force the value of the 100th element, that forces not only the spine of the cons cells 0-99 but also their values as well.
In any case, I'd recommend lazy storable vectors: http://hackage.haskell.org/package/storablevector
You still have O(n) access, but you've reduced it by the factor of your chunk size, so in practice this is much faster than a plain list. And you still get, modulo chunk size, the laziness properties you're interested in.
What about Data.Sequence.Seq? It's not lazy, but it does support O(1) appends at either end, and lookups are O(log(min(i,n-i))), which should be better than most other tree structures. You could keep a Seq
and generate/append more outputs as necessary.
There's also an instance for ListLike if you prefer that API over the one provided by Seq
.
One thing you could do is create a list of arrays. So if each array stores M
elements of your sequence, then accessing element n
is O(n/M)
.
This might be a good middle ground.
For example:
import Data.Array
import Data.List
fibsList :: (Num a) => [a]
fibsList = 1 : 1 : zipWith (+) fibsList (tail fibsList)
chunkSize :: (Num a, Ix a) => a
chunkSize = 100000
fibsChunks :: (Num i, Ix i, Num e) => [Array i e]
fibsChunks = mkChunks fibsList
where mkChunks fs = listArray (0,chunkSize-1) xs : mkChunks ys
where (xs,ys) = splitAt chunkSize fs
lookupFib :: (Integral i, Ix i, Num e) => i -> e
lookupFib n = fibsChunks `genericIndex` q ! r
where (q,r) = n `divMod` chunkSize
精彩评论