Most efficient way to compare doubles stored in a byte array?
Imagine I have two byte[] arrays, b1 and b2, and they have the bytes corresponding to two doubles. One option would be something like...
double thisValue = readDouble(b1, s1);
double thatValue = readDoub开发者_运维问答le(b2, s2);
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
which uses...
/** Parse an integer from a byte array. */
public static int readInt(byte[] bytes, int start) {
return (((bytes[start ] & 0xff) << 24) +
((bytes[start+1] & 0xff) << 16) +
((bytes[start+2] & 0xff) << 8) +
((bytes[start+3] & 0xff)));
}
/** Parse a long from a byte array. */
public static long readLong(byte[] bytes, int start) {
return ((long)(readInt(bytes, start)) << 32) +
(readInt(bytes, start+4) & 0xFFFFFFFFL);
}
/** Parse a double from a byte array. */
public static double readDouble(byte[] bytes, int start) {
return Double.longBitsToDouble(readLong(bytes, start));
}
(code taken from apache hadoop source here and here).
The thing is, you have their byte representations, so it seems wasteful have to actually case them into a double, although maybe this is so heavily optimized as to be negligible. I am sure the Hadoop people know what they are doing, I'm just curious why it wouldn't be better/faster to just compare the bits directly? Or maybe the compiler is smart enough to see this sort of thing and do just that.
Thanks
Due to the structure of the IEEE floating-point format, you can't simply check if all of the bits are identical: for example, -0 and +0 have distinct representations, but are considered equal; and NaN values, which have many different representations, are never equal to anything, including other NaN values with the same representation.
While it is certainly possible to implement these checks yourself, it quickly becomes very complex, and not worth it: the "subvalues" you need to check do not have their own bytes, so you still have to extract the bytes and throw them into larger values - and then you have to actually check all of the different conditions.
In other words, you end up doing the same things that the above piece of code is doing, but you're spending many more lines of code, and you're very unlikely to perform any better than what's already there.
It is possible that one byte array contains the bit pattern for a normalized double value and for the other to contain an unnormalized representation of the same value. In that case, converting and comparing as double values will succeed where comparing byte values would fail.
There are a lot of problems with comparing floating point values by bitwise comparison - for instance, one number may be denormal whereas the other is not. They may be "equal" or comparable, but their bitwise representation will not be.
Id say the only real way you will get a 'most efficient' answer is to do 15-20 minutes of experimentation. I honestly have no idea if using the hadoop approach you detailed would be any faster (or more/less accurate) than loading you're byte[] into a ByteArrayInputStream and decorating that stream with a DataInputStream. (DataInputStream has a .getDouble() method)
byte[] myData = ........
ByteArrayInputStream bais = ByteArrayInputStream(myData);
DataInputStream dis = DataInputStream(bais);
double d1 = dis.getDouble();
double d2 = dis.getDouble();
Let us know what your benchmarks are!
精彩评论