What should be the epsilon value when performing double value equal comparison
Here is the output for the below program.
value is : 2.7755575615628914E-17
Double.compare with zero : 1
isEqual with zero : true
My question is, what should be an epsilon value? Is there any robust way to obtain the value, instead of picking a number out from the sky.
package sandbox;
/**
*
* @author yccheok
*/
public class Main {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
double zero = 1.0/5.0 + 1.0/5.0 - 1.0/10.0 - 1.0/10.0 - 1.0/10.0 - 1.0/10.0;
System.out.println("value is : " + zero);
System.out.println("Double.compare with zero : " + Double.compare(zero, 0.0));
System.out.println("isEqual with 开发者_Python百科zero : " + isEqual(zero, 0.0));
}
public static boolean isEqual(double d0, double d1) {
final double epsilon = 0.0000001;
return d0 == d1 ? true : Math.abs(d0 - d1) < epsilon;
}
}
I like (pseudo code, I don't do java)
bool fuzzyEquals(double a, double b)
{
return abs(a - b) < eps * max(abs(a), abs(b));
}
with epsilon being a few times the machine epsilon. Take 10^-12 if you don't know what to use.
This is quite problem dependant however. If the computations giving a and b are prone to roundoff error, or involve many operations, or are themselves within some (known) accuracy, you want to take a bigger epsilon.
Ths point is to use relative precision, not absolute.
The answer to your second question is no. The magnitude of finite-machine precision error can be arbitrarily large:
public static void main(String[] args) {
double z = 0.0;
double x = 0.23;
double y = 1.0 / x;
int N = 50000;
for (int i = 0; i < N; i++) {
z += x * y - 1.0;
}
System.out.println("z should be zero, is " + z);
}
This gives ~5.55E-12
, but if you increase N
you can get just about any level of error you desire.
There is a vast amount of past and current research on how to write numerically stable algorithms. It is a hard problem.
There is no one right value. You need to compute it relative to the magnitude of the numbers involved. What you're basically dealing with is a number of significant digits, not a specific magnitude. If, for example, your numbers are both in the range of 1e-100, and your calculations should maintain roughly 8 significant digits, then your epsilon should be around 1e-108. If you did the same calculations on numbers in the range of 1e+200, then your epsilon would be around 1e+192 (i.e., epsilon ~= magnitude - significant digits).
I'd also note that isEqual
is a poor name -- you want something like isNearlyEQual
. For one reason, people quite reasonably expect "equal" to be transitive. At the very least, you need to convey the idea that the result is no longer transitive -- i.e., with your definition of isEqual
, isEqual(a, c)
can be false, even though isEqual(a, b)
and isEqual(b, c)
are both true.
Edit: (responding to comments): I said "If [...] your calculations should maintain roughly 8 significant digits, then your epsilon should be...". Basically, it comes to looking at what calculations you're doing, and how much precision you're likely to lose in the process, to provide a reasonable guess at how big a difference has to be before it's significant. Without knowing the calculation you're doing, I can't guess that.
As far as the magnitude of epsilon goes: no, it does not make sense for it to always be less than or equal to 1. A floating point number can only maintain limited precision. In the case of an IEEE double precision floating point, the maximum precision that can be represented is about 20 decimal digits. That means if you start with 1e+200, the absolute smallest difference from that number that the machine can represent at all is about 1e+180 (and a double can represent numbers up to ~1e+308, at which point the smallest difference that can be represented is ~1e+288).
In isEqual
, have something like:
epsilon = Math.max(Math.ulp(d0), Math.ulp(d1))
An ulp of a double value is the positive distance between this floating-point value and the double value next larger in magnitude. [1]
[1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#ulp%28double%29
There are two concepts involved here:
- A machine precision unit:
Double.ulp()
- A machine precision for a given
double d
:Double.ulp(d)
If you call Double.ulp()
you will obtain the machine precision unit, which is the precision you can expect from a certain hardware platform... whatever this definition might be!
If you call Double.ulp(d)
, you will get the machine precision for double d
. In other words, every double d
has its specific precision. This is more useful than the previous paragraph.
You must take particular attention to detail when you are performing iterations which involve calculations in cascade, i.e.: when results from the previous calculations are employed in the current calculation. This is because errors accumulate in these situations and may end up, in certain circumstances, delivering results which are way off the true value they should deliver. In certain circumstances, the size of the accumulated error may even be greater than the true value. See some disastrous examples here.
In certain business domains, numerical computation errors are simply not acceptable. Depending on the business domain, its regulations, requirements and characteristics, you must take alternative approaches for the simplistic choice of employing floating point arithmetic (i.e: doubles
or floats
).
In the case of Finance for example, never ever use floating point arithmetic. Never ever use doubles
or floats
when you are dealing with money. Never. Period. You can employ BigDecimal or fixed point arithmetic, depending on circumstances.
In the specific case of processing stock prices, you know that prices have always 5 digits of precision and, in this case, fixed point arithmetic is plenty enough and also delivers the maximum performance you can possibly obtain, which is a very strong and common requirement in this business domain.
If the business domain really requires numerical computations, you must in this case make sure you keep error propagation under your strict and very careful control. This is a long subject, there are a number of techniques, and very frequently developers overlook the problem simply believing that there's a single magic call to a method which does all the hard work for them. No, it doesn't. You have to do your research, do your homework and do all the hard work necessary in order to make sure you keep errors under control. You need to understand exactly what is going on with the numerical algorithms you've implemented.
You should definitely read https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ first.
It discusses various ways of comparing floating point numbers: absolute tolerance, relative tolerance, ulp distance. It makes a fairly good argument that ulp checking is the way to go. The case hinges around the argument that if you want to check if two floating point numbers are the same, you have to take into account the distance between representable floats. In other words, you should check if the two numbers are within e floats of each other.
The algorithms are given in C, but can be translated to java using java.lang.Double#doubleToLongBits
and java.lang.Float#floatToIntBits
to implement the casting from floating to integer types. Also, with java > 1.5 there are methods ulp(double)
ulp(float)
and for java > 1.6 nextUp(double)
nextUp(float)
nextAfter(double, double)
nextAfter(float, float)
that are useful for quantifying the difference between two floating point numbers.
精彩评论