How can I add floats together in different orders, and always get the same total?
Let's say I have three 32-bit floating point values, a
, b
, and c
, such that (a + b) + c != a + (b + c)
. Is there a summation algorithm, perhaps similar to Kahan summation, that guarantees that these values can be summed in any order and always arrive at the exact same (fairly accurate) total? I'm looking for the general case (i.e. not a solution that only deals with 3 numbers).
Is arbitrary precision arithmetic th开发者_如何学Ce only way to go? I'm dealing with very large data sets, so I'd like to avoid the overhead of using arbitrary precision arithmetic if possible.
Thanks!
There's an interesting 'full-precision-summation' algorithm here, which guarantees that the final sum is independent of the order of the summands (recipe given in Python; but it shouldn't be too difficult to translate to other languages). Note that the recipe as given in that link isn't perfectly correct: the main accumulation loop is fine, but in the final step that converts the list of accumulated partial sums to a single floating-point result (the very last line of the msum
recipe), one needs to be a little bit more careful than simply summing the partial sums in order to get a correctly-rounded result. See the comments below the recipe, and Python's implementation (linked below) for a way to fix this.
It does use a form of arbitrary-precision arithmetic to hold partial sums (the intermediate sums are represented as 'non-overlapping' sums of doubles), but may nevertheless be fast enough, especially when all the inputs are of roughly the same magnitude. And it always gives a correctly rounded result, so accuracy is as good as you could hope for and the final sum is independent of the order of the summands. It's based on this paper (Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates) by Jonathan Shewchuk.
Python uses this algorithm for its implementation of math.fsum, which does correctly-rounded order-independent summation; you can see the C implementation that Python uses here--- look for the math_fsum function.
With some additional information about the terms you have to sum, you can avoid the overhead of Shewchuk's algorithm.
In IEEE 754 arithmetic, x-y
is exact whenever y/2 <= x <= 2*y
(Sterbenz theorem, formally proved here)
So if you can arrange all your terms in an order such that each partial sum is of the form above, then you get the exact result for free.
I am afraid that in practice there is little chance of being in conditions where this is assured to happen. Alternating positive and negatives numbers with increasing magnitudes may be one case where it happens.
Note: the original question was about an algorithm that would give the same result regardless of the summation order. Mark's answer initiated a drift in the direction of "an exact algorithm", but reading again your question, I am afraid that I am pushing things too far when I am suggesting to reorder terms. You probably can't in what you are trying to do, and my answer is probably off-topic. Well, sorry :)
I am not quite sure that (a + b) + c != a + (b + c) when doing arithmetic in a program.
However the rule of thumb with using floating point arithmetic on present day hardware is to never directly test for equality.
For whatever application you have you should choose an epsilon that is small enough and use
(abs(a - b) < epsilon)
as the equality test.
精彩评论