开发者

Read from File Variance Calculation

@Jerry Coffin

I get the logic, while(File>>value)//while input just taken from file is true .... do computation. Yet when I implemented this the counter only went 开发者_StackOverflowto 1 & it's value was very high. Sometime is wrong, but I have no idea what. The file is valid

File.open(FileName, ifstream::in);  
while(File>>value){  
    ++counter;  
    sum += value;  
    sumsqr+= value * value;  
}  
average=sum/counter;  
variance = sumsqr/counter - average*average;  
File.close();  

here's the contents of the input file I am using "text.txt" 23244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 14864152412505862058641048186386848408


Sadly, (at least) three answers have quoted your while (!File.eof()) without commenting on the fact that this is just plain wrong. What you want is something like this:

while (File>>value) {
    ++counter;
    sum += value;
    sumsqr += value * value;
}
average = sum/counter;
variance = sumsqr/counter - average * average;

The bug from using while (!File.eof()) is insidious -- you'll typically get results that look reasonable, and are actually fairly close to correct. The problem is that eof() doesn't become true until after you've attempted to read from the file, and the attempted read has failed. When it fails, value will still have the last value you read, so it'll act like the last number in the list was really there twice (e.g., if your file contained 21 numbers, your loop would execute 22 times, and on the 22nd iteration, it would use the 21st number again). This will throw your calculations off a bit, but usually not enough that it's immediately obvious -- nearly the worst possible kind of bug.

Edit: Here's a complete test program:

#include <fstream>
#include <iostream>

double variance(std::istream &File) {
    double value, average, sum, counter, sumsqr, variance;
    while (File>>value) {
        ++counter;
        sum += value;
        sumsqr += value * value;
    }
    average = sum/counter;
    variance = sumsqr/counter - average * average;
    return variance;
}

double variance2(std::istream &File) {
    double value, average, sum, counter, sumsqr, variance;
    while (!File.eof()) {
        ++counter;
        File >> value;
        sum += value;
        sumsqr += value * value;
    }
    average = sum/counter;
    variance = sumsqr/counter - average * average;
    return variance;
}

int main() { 
    std::ifstream in("data.txt");
    double v1 = variance1(in);
    in.clear();
    in.seekg(0);
    double v2 = variance2(in);

    std::cout << "Using \"while (file>>value)\"" << v1 << "\n";
    std::cout << "Using \"while (!file.eof())\"" << v2 << "\n";
    return 0;
}

Here's some test data to go with:

1
2
3
4
5
6
7
8
9
10

When I run this on that data, I get:

Using "while (file>>value)": 8.25 
Using "while (!file.eof())": 9.17355

As a cross-check, I did the computation in Excel, using two sets of data:

1           1
2           2
3           3
4           4
5           5
6           6
7           7
8           8
9           9
10          10
8.25        10
            9.173553719

The last line in each column is the result of a formula doing "VARP" on the preceding data. Note that my function matches with what Excel produces for the correct input data. The function using while (!file.eof()) matches with what Excel produces with the last number duplicated.

I can't even begin to guess what's happening to make the loop run only once and read an incorrect value. Without being able to either guess at or reproduce the problem, I'm afraid I can't provide much in the way of useful suggestions about how to fix it.


Your computation of variance is totally incorrect. In statistical terms, variance is

E(x^2) - [E(x)^2]

So get rid of that second loop (I'm not even sure what you think it does) and change the first loop to:

while(!File.eof()){
    counter++;
    value = File.get();
    sum += value;
    sumsqr += value*value;
}
average = sum/counter;
variance = (sumsqr/counter) - (average*average);

EDIT: Jerry Coffin's answer is even better as it demonstrates the issue with eof().


you can write like that

variance=counter*(average*average)


In your second !File.eof() loop, you are not reading from the file. Isn't the variance the sum of the squares of the differences between values and the average? Your loop doesn't look at the values from the file at all. Also, using integer variables for the sum, average, and variance is likely to lead to inaccuracy; you might want double for those instead.


while(!File.eof()){
        variance +=(average*average);
    }

The above lines don't appear to make much sense. You are not reading anything in that while block. This while block isn't expected to terminate.


Well, if the question doesn't limit what libraries you can use I would suggest using the Boost Accumulators which make this type of thing trivial.

You get variance, mean, and whatever other basic statistical value you desire. They have a few issues working with long double, but otherwise they are great!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜