开发者

problems in floating point comparison [duplicate]

This question already has answers here: strange output in comparison of float with float literal (8 answers) 开发者_JAVA百科 Closed 6 years ago.
void main()
{
    float f = 0.98;
    if(f <= 0.98)
        printf("hi");
    else
        printf("hello");
    getch();
}

I am getting this problem here.On using different floating point values of f i am getting different results. Why this is happening?


f is using float precision, but 0.98 is in double precision by default, so the statement f <= 0.98 is compared using double precision.

The f is therefore converted to a double in the comparison, but may make the result slightly larger than 0.98.

Use

if(f <= 0.98f)

or use a double for f instead.


In detail... assuming float is IEEE single-precision and double is IEEE double-precision.

These kinds of floating point numbers are stored with base-2 representation. In base-2 this number needs an infinite precision to represent as it is a repeated decimal:

0.98 = 0.1111101011100001010001111010111000010100011110101110000101000...

A float can only store 24 bits of significant figures, i.e.

       0.111110101110000101000111_101...
                                 ^ round off here
   =   0.111110101110000101001000

   =   16441672 / 2^24

   =   0.98000001907...

A double can store 53 bits of signficant figures, so

       0.11111010111000010100011110101110000101000111101011100_00101000...
                                                              ^ round off here
   =   0.11111010111000010100011110101110000101000111101011100

   =   8827055269646172 / 2^53

   =   0.97999999999999998224...

So the 0.98 will become slightly larger in float and smaller in double.


It's because floating point values are not exact representations of the number. All base ten numbers need to be represented on the computer as base 2 numbers. It's in this conversion that precision is lost.

Read more about this at http://en.wikipedia.org/wiki/Floating_point


An example (from encountering this problem in my VB6 days)

To convert the number 1.1 to a single precision floating point number we need to convert it to binary. There are 32 bits that need to be created.

Bit 1 is the sign bit (is it negative [1] or position [0]) Bits 2-9 are for the exponent value Bits 10-32 are for the mantissa (a.k.a. significand, basically the coefficient of scientific notation )

So for 1.1 the single floating point value is stored as follows (this is truncated value, the compiler may round the least significant bit behind the scenes, but all I do is truncate it, which is slightly less accurate but doesn't change the results of this example):

s --exp--- -------mantissa--------
0 01111111 00011001100110011001100

If you notice in the mantissa there is the repeating pattern 0011. 1/10 in binary is like 1/3 in decimal. It goes on forever. So to retrieve the values from the 32-bit single precision floating point value we must first convert the exponent and mantissa to decimal numbers so we can use them.

sign = 0 = a positive number

exponent: 01111111 = 127

mantissa: 00011001100110011001100 = 838860

With the mantissa we need to convert it to a decimal value. The reason is there is an implied integer ahead of the binary number (i.e. 1.00011001100110011001100). The implied number is because the mantissa represents a normalized value to be used in the scientific notation: 1.0001100110011.... * 2^(x-127).

To get the decimal value out of 838860 we simply divide by 2^-23 as there are 23 bits in the mantissa. This gives us 0.099999904632568359375. Add the implied 1 to the mantissa gives us 1.099999904632568359375. The exponent is 127 but the formula calls for 2^(x-127).

So here is the math:

(1 + 099999904632568359375) * 2^(127-127)

1.099999904632568359375 * 1 = 1.099999904632568359375

As you can see 1.1 is not really stored in the single floating point value as 1.1.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜