开发者

Real life example fo Floating Point error

Is there any examples of a company that was burned by floating point data that caused a rounding issue? We're implementing a new system and all the monetary values are stored in floats. I think if i can show actual examples of why开发者_如何学运维 this has failed it'll have more weight than the theory of why the values can't be stored properly.


These examples are from the embedded world (Ariane 5, Patriot) but are not floating-point rounding errors stricto sensu. The Ariane 5 bug is a bug in a conversion. The Patriot bug was introduced during adaptations of the software. It involves computations in different precisions with an inherently unrepresentable constant (which happens to be the innocuous-looking 0.10).

There are two problems I foresee with binary floats for monetary values:

  • decimal values as common as 0.10 cannot be represented exactly.

  • If the precision is too small, what could have been a clean overflow raising an exception becomes a hard-to-track loss of precision.

Note that base-10 floating-point formats have been standardized precisely for monetary values: some currencies are worth 1/1000000 of a dollar, are never exchanged in less than thousands, and the maximum amount you may want to be able to represent is proportionally big, so a scalable representation makes sense. The intent is that the mantissa is large enough for the largest sums with the official resolution.


I worked on a team that created an FPU (hardware design) that passed testfloat level 3 for single, double, and extended precision. There are many bad fpus out there, mostly patched in software if they can catch the instruction or exception, that kind of thing. I think the testfloat guy said the primary fpu errors are in the int to float and float to int conversions, I remember the pentium 4 failed for that reason. The pentium III I had passed testfloat though. I have not tried it in a while, dont know what the state of these multicore processors is. Dont be fooled into thinking that the pentium I was the only one with errors, almost all, certainly the bigger named companies, have fpu errors. IEEE 754 is a horrible standard, getting an fpu to meet that standard is very difficult and very expensive, after that experience I avoid floating point math as much as possible. compilers and c libraries (atof, ftoa, strtod, printf, etc) are are part of the problem, not just the hardware.

single precision float has only 23 bits of mantissa, you are going to start throwing away pennies or dollars or thousands of dollars often. With or without rounding. The rounding should average out if the data is random enough gain a penny here lose a penny there. If the items being tracked are always in some fixed size or a limited of number of units. say widgets at 9.99 or two for 15.99 then the randomness goes away and rounding as well as the mantissa will cost someone in accuracy, either the company or the customers.

Sure there are probably lots of numbers between 0.00 and 0.99 that you cannot represent, if you are dealing with small quantities you will be into the rounding sooner rather than later.

Using floats for money is just a bad idea, perhaps you are looking for ammo to change that?

We had a motor controller that was driven by software that used a single precision fpu, there was one section of the control algorithm that the constants had to add up to 1.0, I didnt know this rule, I simply let a C program compute the constants. We had to adjust the lsbit of the mantissa on one of the constants by hand to get the motor controller to stabilise.


I worked on a system that calculated people's pay raises and bonuses. The calculations were relatively complex due to the number of parts (company performance, department performance, personal performance) but each part was simple enough (generally compounded percentages) something like:

personal_bonus = salary * personal_bonus_percentage

department_bonus = personal_bonus * 50%

company_bonus = personal_bonus * 110%

total_bonus = personal_bonus + department_bonus + company_bonus

Where personal_bonus_percentage was a calculated value based on size of the bonus pot, the person's rating and the people with that rating.

When we tested we didn't manually calculate (i.e. on paper) what the results should have been, instead comparing them to Excel running the same formula. Employees did do the calculations on paper and when we rewrote the algorithm to counter the floating point problem about 5% of the awards were wrong.


I don't think you will find anyone who really got burned. I have heard of companies where the payroll or interest programs used floating point instead of fixed decimal and the programmer collected the fractional bits from all the accounts to embezzle without the account holders raising the alarm. But, that sort of thing was usually quietly fixed years ago. Now there are best-practice rules to prevent that sort of thing.

The other way the error can get big enough to trip you up is if you try to extrapolate from a small sample. Like taking a poll in a small town and trying to predict the popular results for the whole country.

I was working on a project the other month where we were using matrix math to calculate polynomials for a calibration curve. The coefficients from our program were coming out radically different from the ones done on a spreadsheet. When I went through both the program and the spreadsheet and rounded everything to the correct number of significant digits, then they agreed pretty well. When the junk was being multiplied by junk, and then squared or cubed, then it became a problem.


The only real FPU error I can think of is an equality comparison on floats. For instance, 0.123456 and 0.123457 are very close; in fact, they could very well be equal if they are both the result of a series of calculations where rounding errors could accumulate. Instead of comparing with ==, you should author a fuzzy equals that determines if they are sufficiently close as to be considered equal.

A quick google search turns up this page, which goes into great length about the associated caveats regarding a fuzzy equals function. http://adtmag.com/articles/2000/03/16/comparing-floats-how-to-determine-if-floating-quantities-are-close-enough-once-a-tolerance-has-been.aspx


double: 1.000167890
single: 1.000167847
(b*(b-1)) - (b*b-b):  0.0000000281725079
(a*(a-1)) - (a*a-a):  0.0000000000000001
resultSmall - result: 0.000000028172507807931691
double^12 - single^12:0.000000593091349143648472

I remember that errors--or eccentricities--used to exist in supercomputers designed by Seymour Cray. His multipliers would only check the first twelve bits for zero, while the adder searched 13, leaving behind a tiny percentage of numbers that might be zero to one, but have a valid number for the other.

One of William Kahan's students was doing an air turbulence simulation for helicopter rotors using IBM's 360 when it first came out with its unusual 32 bit fp format. a - b wasn't computed exactly, even if the two numbers were within a factor of two of each other, causing an escalating problem of lost accuracy with each calculation. Consequently, the computer was giving wrong results.

Kahan had his student rewrite his code using doubles, which produced correct stall points.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜