How to add and subtract 16 bit floating point half precision numbers?
How do I add and sub开发者_如何学Pythontract 16 bit floating point half precision numbers?
Say I need to add or subtract:
1 10000 0000000000
1 01111 1111100000
2’s complement form.
The OpenEXR library defines a half-precision floating point class. It's C++, but the code for casting between native IEEE754 float and half should be easy to adapt. see: Half/half.h as a start.
Assuming you are using a denormalized representation similar to that of IEEE single/double precision, just compute the sign = (-1)^S, the mantissa as 1.M if E != 0 and 0.M if E == 0, and the exponent = E - 2^(n-1), operate on these natural representations, and convert back to the 16-bit format.
sign1 = -1 mantissa1 = 1.0 exponent1 = 1
sign2 = -1 mantissa2 = 1.11111 exponent2 = 0
sum: sign = -1 mantissa = 1.111111 exponent = 1
Representation: 1 10000 1111110000
Naturally, this assumes excess encoding of the exponent.
精彩评论