Floats, doubles and half floats
I wa开发者_开发问答s wondering about how bits are organized on floats (4 bytes), double (8 bytes) and half floats (2 bytes, used on OpenGL implementation).
Further, how I could convert from one to another?
In essence for each of these formats, you have:
- 1 sign bit
- x exponent bits yielding a whole number E
- y mantissa (or "significand") bits yielding a fractional number M
If the sign bit is 1, the number is negative, else it is positive.
To get the magnitude, you take (1 + M) * 2^(E - k), where k (called the "exponent bias") depends on the format.
It's worth noting that certain combinations of sign, exponent, and mantissa are "special" values, like 0, -inf
, +inf
, and NaN
.
For the specifics (values of x, y, and k) see Wikipedia for single precision (4 bytes), double precision (8 bytes), and half precision (2 bytes).
Note that these are all specified by IEEE 754, so googling that might give you helpful results. :)
Half, Single, Double
Handy-dandy diagrams on those pages. The library should provide means for converting between the various formats.
精彩评论