prevent long running averaging from overflow?

2023-01-08 07:30 问答作者：

suppose I want to calculate average value of a data-set such as

class Averager {
   float total;
   size_t count;
   float addData (float value) {
       this->total += value;
       return this->total / ++this->count;
   }
}

sooner or later the total or count valu开发者_如何学Goe will overflow, so I make it doesn't remember the total value by :

class Averager {
   float currentAverage;
   size_t count;
   float addData (float value) {
       this->currentAverage = (this->currentAverage*count + value) / ++count;
       return this->currentAverage;
   }
}

it seems they will overflow longer, but the multiplication between average and count lead to overflow problem, so next solution is:

class Averager {
   float currentAverage;
   size_t count;
   float addData (float value) {
       this->currentAverage += (value - this->currentAverage) / ++count;
       return this->currentAverage;
   }
}

seems better, next problem is how to prevent count from overflow?

Aggregated buckets.

We pick a bucket size that's comfortably less than squareRoot(MAXINT). To keep it simple, let's pick 10.

Each new value is added to the current bucket, and the moving average can be computed as you describe.

When the bucket is full start a new bucket, remembering the average of the full bucket. We can safely calculate the overall average by combining the averages of the full buckets and the current, partial bucket. When we get to 10 full buckets, we create a bigger bucket, capacity 100.

To compute the total average we first compute the average of the "10s" and then combine that with the "100s". This pattern repeats for "1,000s" "10,000s" and so on. At each stage we only need to consider two levels one 10 x bigger than the previous one.

Use double total; unsigned long long count;. You should still worry about accuracy, but it will be much less of a problem than with float.

What about using Arbitrary-precision arithmetic ?

There's a list of libraries you could use on Wikipedia: http://en.wikipedia.org/wiki/Bignum#Libraries

Most of Arbitrary-precision arithmetic libraries will not overflow until the number of digits stored fill the available memory (which is quite unlikely).

You want to use kahan's summation algorithm:

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

See also the section about errors in summation in "What Every Computer Scientist Should Know About Floating-Point Arithmetic"

http://docs.sun.com/source/806-3568/ncg_goldberg.html#1262

You could use these special datatypes where integeres can grow infinitely until your RAM is full.

I was just thinking about this also. I think this solution works in terms of the new value 'moving the needle'. It only moves it by a factor of the number of previous values that contributed to the average-so-far (plus 1 for itself). It will lose accuracy as the inputs grow but on average should be practically acceptable. Here's some Java code that seems to work. I used floats and ints here to demonstrate that it will work with those limitations but you could use double to gain accuracy. This is just to give you an idea of how to average an array of near-max integers. You would need to keep track of the total number of inputs and the current average, but not the total sum of the inputs. If your total number of inputs approaches MAX_INT, this eventually won't work and you should use the bucket suggestion above, but that is pretty drastic in most cases.

    public float calcAverageContinuous(int[] integers)
{
    float ave = 0;
    for (int i = 0; i < integers.length; i++) {
        ave += (((float)integers[i] - ave) / (float)(i + 1));
    }
    return ave;
}

继续阅读：average math numeric overflow

prevent long running averaging from overflow?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？