C++ doubles: dividing by 100 causes very small error

2023-01-11 07:35 问答作者：

I'm having a problem with the specific set of values in this code snippet.

double inputs[] = {0, -546543, 99015, 6750, 825, 2725, 70475, 
    50950, 42200, 6750, 26925, 16125, 134350, 10075, 79378};
double result = 0;
for (int i = 0; i < 15; i++) {
    result += inputs[i]/100;
}

I expected the final value of result to be 0. And if I take out the division by 100 it is. But when I divide each value by 100 before adding it to result, I end up with -6.8212102632969618e-013 instead.

There's a lot I don't understand about floating point arithmetic. I know it's not guaranteed to be fully precise. But there doesn't seem to be anything unusual about this dataset—no very la开发者_开发百科rge or very small values—so I'm suprised that the calculation is coming out wrong.

Can anybody explain this to me, and offer any suggestions as to how to avoid this problem? The code I presented is simplified; in the actual code I can't just not divide by 100, and I can't very easily add the numbers as integers and divide them later.

Any suggestions will be appreciated.

I can't very easily add the numbers as integers and divide them later.

Why not? That sounds like exactly the solution to your problem. Adding integers and dividing once is likely to much faster than adding floating-point numbers and dividing so much, too.

You are just accumulating error every time you divide by 100 (since 100 is not power of 2). All of your numbers are exactly representable in a double, but when you divide them, they aren't - hence your error. There's not really anything you can do about it except modify your algorithm.

In your case, since you're dividing by 100, you could round off the final sum to the nearest 100th, and get the right result.

What you probably don't consider is, that double uses base-2 for the internal representation. So although the values don't look very precise or small at a first glance, they may be surprisingly hard to represent using base-2.

Take 1/3 = 0.333... for example. You can't precisely describe the value with a finite number of digits in base-10 (except when storing it as a fraction, but let's leave that aside). The same is true for some values in base-2, that look just fine as base-10 numbers.

0.01 is such an example. To store it in base-2, you would need an infinite number of digits and thus 1.0/100.0 isn't precise (as float or double). You divide base-10 integers by 100, so you probably see where this leads to.

Some programming languages offer a base-10 floating-point type (decimal in C# for example), for financial calculations. It's better suited for the base-10 stuff we are used to, but more expensive for the computer. Still there are of course numbers that can't be represented (1/3, ...).

Read The Floating-Point Guide, then you will understand:

internally, computers use a format (binary floating-point) that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.

Basically, every single of your intermediate values (after dividing by 100) has a rounding error. It's not unusual at all that the final result retains an error as well.

Decimal numbers cannot accurately represent a number like 1/3, so you have to round to something like 0.33 - and you don’t expect 0.33 + 0.33 + 0.33 to add up to 1, either - do you?

What can I do to avoid this problem?

That depends on what kind of calculations you’re doing.

If you really need your results to add up exactly, especially when you work with money: use a special decimal datatype.

If you just don’t want to see all those extra decimal places: simply format your result rounded to a fixed number of decimal places when displaying it.

If you have no decimal datatype available, an alternative is to work with integers, e.g. do money calculations entirely in cents. But this is more work and has some drawbacks.

Note that the first case (needing specific decimal results) does not apply in most cases.

Normally, when comparing floating point numbers from different implementations, you can establish some sort of pass/fail criterion for determining equality (or zero). Something like 10^-10 for double precision and maybe 10^-6 for single. Some such benchmark is required if, for example, you're comparing Matlab results to those generated by application code, and its an acknowledgment of unavoidable precision errors.

If you have to use floating point numbers, you might want to:

first add the positive and the negative ones separately and then subtract the sums
divide once by 100 after calculating the sums

As far as I recall is it not guaranteed that the width of the floating point registers in memory and in the CPU are the same size (i.e. some rounding probably happens when you transfer into locations with less width) and where your values are stored depends on what machine code the compiler produces.

继续阅读：floating-point

C++ doubles: dividing by 100 causes very small error

更多精彩内容

精彩评论

最新问答

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

哪里医院专治输卵管堵塞好？

外语基础薄弱的人出国自由行，带哪种翻译器比较好？？

输卵管积液手术价格？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？