开发者

Generating Random Index Vectors

I need to generate random index vectors (with large number of dimensions of about 1000), which would by mostly sparse(mostly zero values). The vectors can contain values of either 1(positive dimension), -1(negative dimension) and 0.These vectors are being generated for every word in corpus of text. What could be the best way to achieve this in Java, w开发者_如何转开发hile ensuring the randomness of the resulting vectors?

Thank you


To store a vector, keep a list of its non-zero positions and +1/-1 bits. You would need a Byte for the +1/-1 bit.

If you really wanted to save as much memory as possible, you could keep a long BitSet containing the +1/-1 information for all the vectors together, and each vector would remember its starting index in the BitSet.

To generate vectors orthogonal to the others, you can do:

 [0 1 0 0 -1 ...]
 [1 0 1 0 0 ...]  // zeros where the first vector is non-zero
 ...

Keep a linked list of all the available 1000 indices. When generating a vector, pick a small random number of random indices, generate a vector with these indices non-zero, and remove the indices from the list of available indices. However this way you quickly run out of available indices. But in 1000-dimensional space there are only 1000 mutually orthogonal vectors, so you could create vectors for at most 1000 words anyway.

Also, the fact that the vectors have to be orthogonal means that they can't be completely random, because truly random vectors could happen to be non-orthogonal.


If you want to try a low-cost approach (programming-wise), then a HashMap<Integer, Byte> or something close could make a decent sparse vector.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜