Question regarding IEEE 754, 64 bits double?

2023-02-13 03:23 问答作者：

Please take a look at the f开发者_如何学JAVAollowing content:

I understand how to convert a double to a binary based on IEEE 754. But I don't understand what the formula is used for.

Can anyone give me an example when we use the above formula, please?

Thanks a lot.

The formula that is highlighted in red can be used to calculate the real number that a 64-bit value represents when treated as a IEEE 754 double. It's only useful if you want to manually calculate the conversion from binary to the base-10 real number that it represents, such as when verifying the correctness of a C library's implementation of printf.

For example, using the formula on 0x3fd5555555555555, x is found to be exactly 0.333333333333333314829616256247390992939472198486328125. That is the real number that 0x3fd5555555555555 represents.

#include <stdio.h>
#include <stdlib.h>

int main()
{
  union {
    double d;
    unsigned long long ull;
  } u;

  u.ull = 0x3fd5555555555555L;
  printf("%.55f\n", u.d);

  return EXIT_SUCCESS;
}

http://codepad.org/kSithgZQ

EDIT: As Olof commented, an IEEE 754 double exactly represents the value x in the equation, but not all real numbers are exactly representable. In fact, only a finite number of reals such as 0.5, 0.125, and 0.333333333333333314829616256247390992939472198486328125 are exactly representable, while the vast majority (uncountably many) including 1/3, 0.1, 0.4, and π are not.

The key to knowing whether a real is exactly-representable as an IEEE 754 double is to calculate the real number's binary representation and write it in scientific notation (e.g. b1.001×2^-1 for 0.5625). If the number of binary digits to the right of the decimal point excluding trailing zeroes is less than or equal to 52 and the exponent minus one is between -1022 and +1023, inclusive, then the number is exactly representable.

Let's go through a couple of examples. Note that it helps to have an arbitrary-precision calculator on hand. I will use ARIBAS.

The number 1/64 is 0.015625 in decimal. To calculate its binary representation, we can use ARIBAS' decode_float function:

 ==> set_floatprec(double_float).
-: 64

==> 1/64.
-: 0.0156250000000000000

==> set_printbase(2).
-: 0y10

==> decode_float(1/64).
-: (0y10000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000, 
-0y1000101)

==> set_printbase(10).
-: 10

==> -0y1000101.
-: -69

Thus 1/64 = b0.000001, or b1.0×2^-6 in scientific notation.

1/64 is exactly-representable.

The number 1/10 = 0.1 in decimal. To calculate its binary representation:
```
==> set_printbase(2).
-: 0y10

==> decode_float(1/10).
-: (0y11001100_11001100_11001100_11001100_11001100_11001100_11001100_11001100, 
-0y1000011)

==> set_printbase(10).
-: 10

==> -0y1000011.
-: -67
```
So 1/10 = 0.1 = b0.0001100 (where bold represents a repeating digit sequence), or b1.1001100×2^-4 in scientific notation.

1/10 is not exactly-representable.

The formula is to convert the binary representation into a number !

You only need it if you are implementing a floating point unit

继续阅读：c ieee-754

Question regarding IEEE 754, 64 bits double?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？