Random numbers Mathematica vs Java
Which set is more "random"?
Math.random() for Java or random for Mathematica? Java is in blue, Mathematica in red.
numbers are from 0 to 50 (51?)
EDIT: It's a histogram generated in Mathematica.
Java Source (ugly)
public static void main(String[] args) {
// TODO Auto-generated method stub
int i = 0;
int sum = 0;
int counter = 0;
String randomNumberList = " ";
int c = 0;
while (c != 50){
while (i != 7) {
i = (int) (51 * Math.random());
sum += i;
++counter;
randomNumberList += " " + i;
}
i = 0;
System.out.print("\n" + randomNumberList);
开发者_开发技巧 ++c;
}
}
Mathematica source (output.txt is the dump from Java)
dataset = ReadList["~/Desktop/output.txt", Number]
dataset2 = RandomReal [{0, 50}, 50000]
Histogram[{dataset, dataset2}]
[EDIT]: I was just learning loops when I did the code. Sorry for the confusion. Now I made a cleaner version and they are about equally distributed. I guess that arbitrary loop ending made a big difference.
new code:
public class RandomNums {
public static void main(String[] args) {
int count = 0;
for (int i = 0; i <= 50000; i++){
int j = (int) (50 * Math.random());
System.out.print(j + " ");
count++;
if (count == 50){
System.out.println("\n");
count = 0;
}
}
}
}
If this plot suggests anything to me, it is that the quality of Mathematica's uniform random distribution is much better than the implementation in Java
you are showing (I don't claim that for any Java implementation. Also, as a disclaimer, and not to start a flame war, I've been both J2EE and Mathematica developer for some time, although admittedly have more experience in the latter).
Here is the argument. You have 50000 points and 50 bins (histogram bars) shown, which suggests that you roughly have 1000 points per bin. More precisely, we can use ergodicity to cast the problem of 50000 uniformly distributed points into that of 50000 independent trials, and ask what is the mean number of points to end up in each bin, and the variance. The probability that any particular bin ends up with exactly k
points out of Npoints
is given then by a binomial distribution:
For which, the mean is Npoints/Nbins
(which is what we expect intuitively, of course), and the variance is Npoints * (1-1/Nbins)* 1/Nbins ~ Npoints/Nbins = 1000
, in our case (Npoints = 50000, Nbins = 50
). Taking a square root, we get the standard deviation as sqrt(1000) ~ 32
, which is about 3% of the mean (which is 1000
). The conclusion is that, for an ideal uniform distribution, and for a given number of points and bins, we should expect deviations from the mean of the order of 3%, for each bin. And this is very similar to what Mathematica distribution gives us, judging by the picture. The deviations for individual bins for Java distribution (again, the particular implementation presented here), are much larger, and suggest correlations between bins and that overall this uniform distribution is of much poorer quality.
Now, this is a "high-level" argument, and I am not going into details to discover the reason. This seems logical however, given that the traditional target audience for Mathematica (sciences, academia) is (or at least used to be) much more demanding in this respect, than that for Java. That said, I have no doubts that there exist many excellent Java implementations of random number generators for many statistical distributions - they are just not something built into the language, unlike in Mathematica.
Not a direct answer to the question... but anyone who wants to perform some of the "hypothesis testing" suggested by @belisarius in response to @Leonid might find the following code snippet useful to try things out without leaving Mathematica:
Needs["JLink`"]
(* Point to a different JVM if you like... *)
(* ReinstallJava[CommandLine -> "...", ClassPath-> "..."] *)
ClearAll@JRandomInteger
JRandomInteger[max_, n_:1] := JRandomInteger[{0, max}, n]
JRandomInteger[{min_, max_}, n_:1] :=
JavaBlock[
Module[{range, random}
, range = max - min + 1
; random = JavaNew["java.util.Random"]
; Table[min + random@nextInt[range], {n}]
]
]
Histogram[
Through[{JRandomInteger, RandomInteger}[{0, 50}, 50000]]
, ChartLegends->{"Java","Mathematica"}
]
Note that this snippet uses Random.nextInt() instead of Math.random() in an attempt to handle that tricky upper boundary a bit better.
Have a look here. It deals with java.util.Random and displays some gotchas. It also recommends using SecureRandom (more expensive, more secure) if you want real-er ( :-) ) randomness.
I find a very flat distribution suspicious given is it supposed to be random.
The following code prints what I would expect to see which is, a variation in the count of occurrences due to randomness.
Random : min count 933, max count 1089
Random : min count 952, max count 1071
Random : min count 922, max count 1056
Random : min count 936, max count 1083
Random : min count 938, max count 1063
SecureRandom : min count 931, max count 1069
SecureRandom : min count 956, max count 1070
SecureRandom : min count 938, max count 1061
SecureRandom : min count 958, max count 1100
SecureRandom : min count 929, max count 1068
/dev/urandom: min count 937, max count 1093
/dev/urandom: min count 936, max count 1063
/dev/urandom: min count 931, max count 1069
/dev/urandom: min count 941, max count 1068
/dev/urandom: min count 931, max count 1080
Code
import java.io.*;
import java.security.SecureRandom;
import java.util.Random;
public class Main {
public static void main(String... args) throws IOException {
testRandom("Random ", new Random());
testRandom("SecureRandom ", new SecureRandom());
testRandom("/dev/urandom", new DevRandom());
}
private static void testRandom(String desc, Random random) {
for (int n = 0; n < 5; n++) {
int[] counts = new int[50];
for (int i = 0; i < 50*1000; i++)
counts[random.nextInt(50)]++;
int min = Integer.MAX_VALUE, max = Integer.MIN_VALUE;
for (int count : counts) {
if (min > count) min = count;
if (max < count) max = count;
}
System.out.println(desc+": min count " + min + ", max count " + max);
}
}
static class DevRandom extends Random {
DataInputStream fis;
public DevRandom() {
try {
fis = new DataInputStream(new BufferedInputStream(new FileInputStream("/dev/urandom")));
} catch (FileNotFoundException e) {
throw new AssertionError(e);
}
}
@Override
protected int next(int bits) {
try {
return fis.readInt();
} catch (IOException e) {
throw new AssertionError(e);
}
}
@Override
protected void finalize() throws Throwable {
super.finalize();
if (fis != null) fis.close();
}
}
}
With a properly written random code:
public static void main(String[] args) {
String randomNumberList = " ";
for (int c = 0; c < 50000; ++c) {
// random integer in the range 0 <= i < 50
int i = (int) Math.floor(50 * Math.random());
System.out.print(i + " ");
}
}
I don't see the variance you're talking about
* *
** * * ** * *** ** *** ** * * * * ****
****** **************** **************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
**************************************************
Python code to generate the graph:
#!/usr/bin/env python
s = raw_input()
nums = [int(i) for i in s.split()]
bins = dict((n,0) for n in range(50))
for i in nums:
bins[i] += 1
import itertools
heightdivisor = 50 # tweak this to make the graph taller/shorter
xx = ['*'*(v / heightdivisor) for k,v in bins.items()]
print '\n'.join(reversed([''.join(x) for x in itertools.izip_longest(*xx, fillvalue=' ')]))
精彩评论