Most efficient way to compare doubles stored in a byte array?

2023-04-10 03:31 问答作者：

Imagine I have two byte[] arrays, b1 and b2, and they have the bytes corresponding to two doubles. One option would be something like...

double thisValue = readDouble(b1, s1);
double thatValue = readDoub开发者_运维问答le(b2, s2);
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));

which uses...

/** Parse an integer from a byte array. */
public static int readInt(byte[] bytes, int start) {
  return (((bytes[start  ] & 0xff) << 24) +
          ((bytes[start+1] & 0xff) << 16) +
          ((bytes[start+2] & 0xff) <<  8) +
          ((bytes[start+3] & 0xff)));
}

/** Parse a long from a byte array. */
public static long readLong(byte[] bytes, int start) {
  return ((long)(readInt(bytes, start)) << 32) +
    (readInt(bytes, start+4) & 0xFFFFFFFFL);
}

/** Parse a double from a byte array. */
public static double readDouble(byte[] bytes, int start) {
  return Double.longBitsToDouble(readLong(bytes, start));
}

(code taken from apache hadoop source here and here).

The thing is, you have their byte representations, so it seems wasteful have to actually case them into a double, although maybe this is so heavily optimized as to be negligible. I am sure the Hadoop people know what they are doing, I'm just curious why it wouldn't be better/faster to just compare the bits directly? Or maybe the compiler is smart enough to see this sort of thing and do just that.

Thanks

Due to the structure of the IEEE floating-point format, you can't simply check if all of the bits are identical: for example, -0 and +0 have distinct representations, but are considered equal; and NaN values, which have many different representations, are never equal to anything, including other NaN values with the same representation.

While it is certainly possible to implement these checks yourself, it quickly becomes very complex, and not worth it: the "subvalues" you need to check do not have their own bytes, so you still have to extract the bytes and throw them into larger values - and then you have to actually check all of the different conditions.

In other words, you end up doing the same things that the above piece of code is doing, but you're spending many more lines of code, and you're very unlikely to perform any better than what's already there.

It is possible that one byte array contains the bit pattern for a normalized double value and for the other to contain an unnormalized representation of the same value. In that case, converting and comparing as double values will succeed where comparing byte values would fail.

There are a lot of problems with comparing floating point values by bitwise comparison - for instance, one number may be denormal whereas the other is not. They may be "equal" or comparable, but their bitwise representation will not be.

Id say the only real way you will get a 'most efficient' answer is to do 15-20 minutes of experimentation. I honestly have no idea if using the hadoop approach you detailed would be any faster (or more/less accurate) than loading you're byte[] into a ByteArrayInputStream and decorating that stream with a DataInputStream. (DataInputStream has a .getDouble() method)

byte[] myData = ........
ByteArrayInputStream bais = ByteArrayInputStream(myData);
DataInputStream dis = DataInputStream(bais);

double d1 = dis.getDouble();
double d2 = dis.getDouble();

Let us know what your benchmarks are!

Most efficient way to compare doubles stored in a byte array?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？