开发者

How to distinguish between 1 and zero floating-point values?

I know it might be a noob question but since it's not covered in the e-book I'm studying I'm gonna ask it. In IEEE standard binary floating开发者_JAVA百科-point format actually representing numbers by scientific notation I know that an integer value of one is always assumed to be added to the fractional part illustrated by the significand part of the binary and is not included in the binary, So what confuses me is how to How to distinguish between 1 and zero floating-point values because I assume both have a totally zero significand and I guess the differentiation here should be done by exponent part but don't know how!


For the zeroes (there are two, a positive and a negative zero that differ in the sign bit but must be considered equal), the significand and the exponent are all 0-bits, whereas for non-zero values at least one of them has a 1-bit (for a value of 1, the exponent is all 1-bits except for the most significant one).

The Wikipedia article on the IEEE 754 standard lists the exact bit patterns.


This is a good question. I always wondered about this, too. And your supposition is correct: The classification is done by the exponent value.

In the IEEE-754 floating-point formats there are, depending on how you count them, up to five or six different "kinds" or "types" or "classes" of values. And the exponent field is simultaneously the biased exponent, and a sort of selector to help determine which kind of value you have.

Basically, the exponent value can be 0, or it can be its maximum value, or it can be somewhere in between. And the significand (also known as "mantissa") can either be zero or nonzero. So that's 3 × 2 = 6 different possibilities. And if we arrange them in a table, everything becomes reasonably clear:

How to distinguish between 1 and zero floating-point values?

When both the significand and the exponent are 0, we have a true zero value. When the exponent has (almost) any other value, other than its maximum, we have an ordinary floating-point number, with an implicit 1 bit. When the exponent is 0 but the significand is nonzero, we have a "subnormal" number, without the implicit 1 bit. (The story of subnormal numbers is reasonably fascinating in its own right, but I'm not going to go into it here.) And when the exponent has its maximum value — 255 for single-precision, or 2047 for double-precision — this indicates we don't have a finite, numerical floating-point value at all, but rather a special 'infinity' or 'not-a-number' (NaN) marker.

I like this table a lot, but it's worth noting that it is misleading in one respect. It makes it look like the six "classes" are of roughly equal size, but of course they're not. There are two "special" exponent values, but there are a lot more — either 254 or 2046 — that are ordinary. And there's only one significand value that is ever special, but a whole lot more — either 223-1, or 252-1 — that are ordinary. So that middle cell, labeled "ordinary floating-point numbers", actually accounts for at least 99% of the values.

(But it is true, and perhaps surprising, that there is not just one NaN value, but quite a few of them: 8388607 of them in single precision, and a whopping 252-1 of them in double precision.)

Here is another way of arranging the table:

How to distinguish between 1 and zero floating-point values?

This shows how the values increase, bottom to top, from zero to subnormal to ordinary, with infinity and the NaNs at the top.

One final point. Within the "ordinary" floating-point values, the ones where the exponent is neither at its minimum or its maximum, is there any significance to whether the significand is zero or nonzero? Only this: if the stored significand is 0, this means that the only 1 bit it has is the implicit one. In other words, any floating-point value with an "intermediate" exponent and a zero significand is going to be a perfect power of two: 1, 2, 4, 8, …, or ½, ¼, ⅛, … .


I wrote an answer mentioning (among other things) the implicit bit (which is what I assume you're wondering about) here https://stackoverflow.com/questions/327020/why-are-floating-point-values-so-prolific/4164252#4164252

I'll expand on it further here. I'll use the character sequences "<=>" and "=>" to mean "equivalent to" and "giving".

If you look at the iEEE-754 single-precision floating point (SPFP) number in 32-bit unsigned integer format this is how to extract the individual parts:

  • Sign: AND with 0x80000000 (1 bit) and shift right 31 places
  • Exponent: AND with 0x7f800000 (8 bits) and shift right 23 places
  • Significand (mantissa): AND with 0x007fffff (23 bits). If the original floating-point number is non-zero you OR in the "implicit" bit with 0x00800000 (=> 24 bits in significand).

There are two variants of zero: 0.0 and -0.0 (0x00000000 and 0x80000000). Exponent = 0 and significand = 0 define a zero. In the same manner there are also two variants of one: 1.0 and -1.0 (0x3f800000 and 0xbf800000). As you can see there is no confusing 0.0 and 1.0. I'll try to explain why.

Any non-zero number will have an exponent in the range 0x01 to 0xfe. Somewhat over-simplified the exponent 0x00 with a non-zero significand is used for the underflow result case and exponent 0xff with a non-zero significand for the overflow result case (i e SPFP exceptions). The exponent corresponding to 1.0 is 0x7f which corresponds to 0 (see next paragraph) which gives 2^0 = 1. The next exponent just below is 0x7e and corresponds to -1 which gives 2^-1 = 0.5 and so on. For the exponent 0x7f the significand will attempt to represent all numbers in the range 1.0 <= x < 2.0 which is to say that the exponent defines the lower end of the numbers you want to represent which can go up to but not including the next higher 2's exponent.

If you find the exponent difficult to understand you and want it to appear "more normal" (being a base 10 person) you can subtract 0x7f (127) from it and you will get the range -126 to 127. -128 will be the overflow exponent and -127 the underflow.

Just so you don't think I've forgotten: if you have the sign bit set the exponent 0x7f will attempt to represent all numbers in the range -1.0 >= x > -2.0.

Now to the implicit bit. The implicit bit can be called bit "22.5" since it is right in-between the highest explicit bit of the significand and the lowest explicit bit of the exponent. Its implication is a 1 for the exponent position. So for exponent 0x7f (<=> 0 => 2^0) it implies that 1.0 is a component of the real number being represented. The first explicit bit to the right of it (bit 22 of the mantissa) signals if the number corresponding to the next smaller exponent (07f-0x01 = 0x7e <=> -1 => 2^-1) or 0.5 is a component of the real number and so on. The smallest component of a single precision floating point value with an exponent of 0x7f is therefore 0x7f - 23 (bits in significand) = 0x68 (<=> -23 => 2^-23).

To put it all together: the real number corresponding to the SPFP value 0x42b80000 is exponent 0x85-0x7f = 6 => 64.0 for the implicit bit:

  • 2^6 * 1 (implicit bit always 1) +
  • 2^5 * 0 (bit 22 of significand is reset) +
  • 2^4 * 1 (bit 21 is set) +
  • 2^3 * 1 (bit 20 is set) +
  • 2^2 * 1 (bit 19 is set) +
  • (bits 18 to 0 are reset and their corresponding components (2^1 to 2^-17) are therefore not used)

2^6+2^4+2^3+2^2 => 64+16+8+4 => 92.0 which is the real number represented by 0x42b80000.

In this example you can see how/that the significand is left-adjusted which allows the implicit bit 22.5 of the SPFP format to become explicit bit 23 (though always set) of the significand, thereby adding an additional bit of precision to the SPFP format. The DPFP (Double Precision) format is similar but the exponent range is larger and the significand longer.

I recommend you do some experimenting on the format. My personal guess is that 99% of all programmers never have.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜