开发者

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each other at a given level of Pearson r. Here is the kicker: Array X and Array Y must be uniform distributions.

I can do this with a normal distribution, but transforming the values without skewing the distribution has me stumped. I tried re-ordering the values in the arrays to increase the correlation, but I will never get arrays correlated at 1.00 or -1.00 just by sorting.

Any ideas?

--

here is the AS3 code for random correlated gaussians, to get the wheels turning:

public static function nextCorrelatedGaussians(r:Number):Array{    开发者_开发知识库         
         var d1:Number;
         var d2:Number;
         var n1:Number;
         var n2:Number;
         var lambda:Number;
         var r:Number;
         var arr:Array = new Array();
         var isNeg:Boolean; 

        if (r<0){
            r *= -1;
              isNeg=true;
        }            
        lambda= (   (r*r)  -  Math.sqrt(  (r*r) - (r*r*r*r)  )     )   /   ((  2*r*r ) - 1  );

        n1 = nextGaussian();
        n2 = nextGaussian();           
        d1 = n1;            
        d2 = ((lambda*n1) + ((1-lambda)*n2)) / Math.sqrt( (lambda*lambda) + (1-lambda)*(1-lambda));

        if (isNeg) {d2*= -1}           
        arr.push(d1);
        arr.push(d2);
        return arr;
    }


I ended up writing a short paper on this

It doesn't include your sorting method (although in practice I think it's similar to my first method, in a roundabout way), but does describe two ways that don't require iteration.


Here is an implementation of of twolfe18's algorithm written in Actionscript 3:

for (var j:int=0; j < size; j++) {
 xValues[i]=Math.random());
}
var varX:Number = Util.variance(xValues);
var varianceE:Number = 1/(r*varX) - varX;

for (var i:int=0; i < size; i++) {
 yValues[i] = xValues[i] + boxMuller(0, Math.sqrt(varianceE));
}

boxMuller is just a method that generates a random Gaussian with the arguments (mean, stdDev). size is the size of the distribution.

Sample output

Target p: 0.8
Generated p: 0.04846346291280387
variance of x distribution: 0.0707786253165176
varianceE: 17.589920412141158

As you can see I'm still a ways off. Any suggestions?


This apparently simple question has been messing up with my mind since yesterday evening! I looked for the topic of simulating distributions with a dependency, and the best I found is this: simulate dependent random variables. The gist of it is, you can easily simulate 2 normals with given correlation, and they outline a method to transform these non-independent normals, but this won't preserve correlation. The correlation of the transform will be correlated, so to speak, but not identical. See the paragraph "Rank correlation coefficents".

Edit: from what I gather from the second part of the article, the copula method would allow you to simulate / generate random variables with rank correlation.


start with the model y = x + e where e is the error (a normal random variable). e should have a mean of 0 and variance k.

long story short, you can write a formula for the expected value of the Pearson in terms of k, and solve for k. note, you cannot randomly generate data with the Pearson exactly equal to a specific value, only with the expected Pearson of a specific value.

i'll try to come back and edit this post to include a closed form solution when i have access to some paper.

EDIT: ok, i have a hand-wavy solution that is probably correct (but will require testing to confirm). for now, assume desired Pearson = p > 0 (you can figure out the p < 0 case). like i mentioned earlier, set your model for Y = X + E (X is uniform, E is normal).

  1. sample to get your x's
  2. compute var(x)
  3. the variance of E should be: (1/(rsd(x)))^2 - var(x)
  4. generate your y's based on your x's and sample from your normal random variable E

for p < 0, set Y = -X + E. proceed accordingly.

basically, this follows from the definition of Pearson: cov(x,y)/var(x)*var(y). when you add noise to the x's (Y = X + E), the expected covariance cov(x,y) should not change from that with no noise. the var(x) does not change. the var(y) is the sum of var(x) and var(e), hence my solution.

SECOND EDIT: ok, i need to read definitions better. the definition of Pearson is cov(x, y)/(sd(x)sd(y)). from that, i think the true value of var(E) should be (1/(rsd(x)))^2 - var(x). see if that works.


To get a correlation of 1 both X and Y should be the same, so copy X to Y and you have a correlation of 1. To get a -1 correlation, make Y = 1 - X. (assuming X values are [0,1])


A strange problem demands a strange solution -- here is how I solved it.

-Generate array X

-Clone array X to Create array Y

-Sort array X (you can use whatever method you want to sort array X -- quicksort, heapsort anything stable.)

-Measure the starting level of pearson's R with array X sorted and array Y unsorted.

WHILE the correlation is outside of the range you are hoping for

   IF the correlation is to low
         run one iteration of CombSort11 on array Y then recheck correlation
   ELSE IF the correlation is too high
         randomly swap two values and recheck correlation

And thats it! Combsort is the real key, it has the effect of increasing the correlation slowly and steadily. Check out Jason Harrison's demo to see what I mean. To get a negative correlation you can invert the sort or invert one of the arrays after the whole process is complete.

Here is my implementation in AS3:

public static function nextReliableCorrelatedUniforms(r:Number, size:int, error:Number):Array {
        var yValues:Array = new Array;
        var xValues:Array = new Array;
        var coVar:Number = 0;
        for (var e:int=0; e < size; e++) { //create x values            
            xValues.push(Math.random());
    }
    yValues = xValues.concat();
    if(r != 1.0){
        xValues.sort(Array.NUMERIC);
    }
    var trueR:Number = Util.getPearson(xValues, yValues);

        while(Math.abs(trueR-r)>error){
            if (trueR < r-error){   // combsort11 for y     
                var gap:int = yValues.length;
                var swapped:Boolean = true; 
                while (trueR <= r-error) {
                    if (gap > 1) {
                        gap = Math.round(gap / 1.3);
                    }
                    var i:int = 0;
                    swapped = false;
                    while (i + gap < yValues.length  &&  trueR <= r-error) {
                        if (yValues[i] > yValues[i + gap]) {
                            var t:Number = yValues[i];
                            yValues[i] = yValues[i + gap];
                            yValues[i + gap] = t;
                            trueR = Util.getPearson(xValues, yValues)
                            swapped = true;
                        }
                        i++;
                    }
                }
            }

            else { // decorrelate
                while (trueR >= r+error) {
                    var a:int = Random.randomUniformIntegerBetween(0, size-1);
                    var b:int = Random.randomUniformIntegerBetween(0, size-1);
                    var temp:Number = yValues[a];
                    yValues[a] = yValues[b];
                    yValues[b] = temp;
                    trueR = Util.getPearson(xValues, yValues)
               }                
            }
        }
        var correlates:Array = new Array;
        for (var h:int=0; h < size; h++) {
            var pair:Array = new Array(xValues[h], yValues[h]);
            correlates.push(pair);}
        return correlates;
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜