Determining the similarity between two random number generators
Suppose that I have two random number genertors RNG-A and RNG-B, such that:
- They both produce random, non-infinite floating point numbers when called
- I can call the generators repeat开发者_如何学编程edly and generate as many random numbers as I like
- The random numbers generated are independent and identically distributed (i.e. the output of the RNGs is independent of everything they have previously produced)
- I can't guarantee anything else about the shape of the distribution
I would like to obtain a measure of how similar the two random distributions are, and ideally use this to determine if they appear to producing the same distribution.
What is is the best algorithm for doing this?
I think you'll find your answers here.
Excerpts:
Testing Random Number Generators
Does observed data satisfies a particular distribution?
• Chi-square test
• Kolmogorov-Smirnov test
• Serial correlation test
• Two-level tests
• K-distributivity
• Serial test
• Spectral test
.....
Another section:
Serial Correlation Test
• Test if 2 random variables are dependent
—is their covariance non-zero?
– if so, dependent. converse not true.
HTH!
In randomize algorithms main concern is in Mean and Variance, also Mode and some other factors are important, but you can generate too many number and compare their related Mean and Variance, and check their similarity. Also you can find relation ship of them with other functions (like Gaussian function). but the most famous test for your case is:
- Kolmogorov–Smirnov test
Also you can use chi square test if you want to have a finite numbers (for example generated number % big prime number)
Because you cannot make a statement about the either distribution, you may need a non-parametric test to compare the (unknown) distributions. You can use a K-S test, but when you look at applications, look under non-parametric statistics.
When you say compare two distributions, it's not really clear how detailed an answer you want. For example, consider these two sequences:
RNG-A: 1111100000
RNG-B: 1010101010
Since the means and variances are identical, it would pass the Kolmogorov–Smirnov test with flying colours. However, it's obvious that RNG-A and RNG-B generate sequences with different characteristics. Depending on your situation, this may or may not be problem. As long as you know the risks involved, you can make an informed decision.
If you really want to make sure that the generators are identical, then take a look at the link provided in belisarius' answer. However, this compares a RNG to a known distribution. In your case, the you don't know either distribution. Although I suppose you could simulate RNG-A enough times as an approximation to get going.
Another useful thing to look at is the Diehard tests. See the answers to this question at the stats.SE.
精彩评论