开发者

My variance function in C# does not return accurate value

The source data :

    static double[] felix = new double[] { 0.003027523, 0.002012256, -0.001369238, -0.001737660, -0.001647287, 
        0.000275154, 0.002017238, 0.001372621, 0.000274148, -0.000913576, 0.001920263, 0.001186456, -0.000364631, 
        0.000638337, 0.000182266, -0.001275626, -0.000821093, 0.001186998, -0.000455996, -0.000547445, -0.000182582,
        -0.000547845, 0.001279006, 0.000456204, 0.000000000, -0.001550388, 0.001552795, 0.000729594, -0.000455664, 
        -0.002188184, 0.000639620, 0.000091316, 0.001552228, -0.001002826, 0.000182515, -0.000091241, -0.000821243,
        -0.002009132, 0.000000000, 0.0开发者_高级运维00823572, 0.001920088, -0.001368863, 0.000000000, 0.002101800, 0.001094291, 
        0.001639643, 0.002637323, 0.000000000, -0.000172336, -0.000462665, -0.000136141 };

The variance function:

    public static double Variance(double[] x)
    {
        if (x.Length == 0)
            return 0;
        double sumX = 0;
        double sumXsquared = 0;
        double varianceX = 0;
        int dataLength = x.Length;


        for (int i = 0; i < dataLength; i++)
        {
            sumX += x[i];
            sumXsquared += x[i] * x[i];
        }

        varianceX = (sumXsquared / dataLength) - ((sumX / dataLength) * (sumX / dataLength));
        return varianceX;
    }

Excel and some online calculator says the variance is 1.56562E-06 While my function gives me 1.53492394804015E-06. I begin to doubt if the C# has accuracy problem or what. Is there anyone have this kind of problem before?


What you are seeing is the difference between sample variance and population variance and nothing to do with floating point precision or the accuracy of C#'s floating point implementation.

You are calculating population variance. Excel and that web site are calculating sample variance.

Var and VarP are distinct calculations and you do need to be careful about which one you are using. (unfortunately people often refer to them as if they are interchangeable when they are not. The same is true for standard deviation)

Sample variance for your data is 1.56562E-06, population variance is 1.53492394804015E-06.

From some code posted on codeproject awhile back:

Variance in a sample

public static double Variance(this IEnumerable<double> source)
{
    double avg = source.Average();
    double d = source.Aggregate(0.0, (total, next) => total += Math.Pow(next - avg, 2));
    return d / (source.Count() - 1);
}

Variance in a population

public static double VarianceP(this IEnumerable<double> source)
{
    double avg = source.Average();
    double d = source.Aggregate(0.0, (total, next) => total += Math.Pow(next - avg, 2));
    return d / source.Count();
}


Here's an alternate implementation, that is sometimes better-behaved, numerically:

        mean = Average(data);
        double sum2 = 0.0, sumc = 0.0;


        for (int i = 0; i < data.Count; i++)
        {
           double dev = data[i] - mean;
           sum2 += dev * dev;
           sumc += dev;
        }

        return (sum2 - sumc * sumc / data.Count) / data.Count;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜