开发者

Biggest integer that can be stored in a double

What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision?

In other words, at would the follow c开发者_JAVA百科ode fragment return:

UInt64 i = 0;
Double d = 0;

while (i == d)
{
        i += 1; 
        d += 1;
}
Console.WriteLine("Largest Integer: {0}", i-1);


The biggest/largest integer that can be stored in a double without losing precision is the same as the largest possible value of a double. That is, DBL_MAX or approximately 1.8 × 10308 (if your double is an IEEE 754 64-bit double). It's an integer. It's represented exactly. What more do you want?

Go on, ask me what the largest integer is, such that it and all smaller integers can be stored in IEEE 64-bit doubles without losing precision. An IEEE 64-bit double has 52 bits of mantissa, so I think it's 253:

  • 253 + 1 cannot be stored, because the 1 at the start and the 1 at the end have too many zeros in between.
  • Anything less than 253 can be stored, with 52 bits explicitly stored in the mantissa, and then the exponent in effect giving you another one.
  • 253 obviously can be stored, since it's a small power of 2.

Or another way of looking at it: once the bias has been taken off the exponent, and ignoring the sign bit as irrelevant to the question, the value stored by a double is a power of 2, plus a 52-bit integer multiplied by 2exponent − 52. So with exponent 52 you can store all values from 252 through to 253 − 1. Then with exponent 53, the next number you can store after 253 is 253 + 1 × 253 − 52. So loss of precision first occurs with 253 + 1.


9007199254740992 (that's 9,007,199,254,740,992 or 2^53) with no guarantees :)

Program

#include <math.h>
#include <stdio.h>

int main(void) {
  double dbl = 0; /* I started with 9007199254000000, a little less than 2^53 */
  while (dbl + 1 != dbl) dbl++;
  printf("%.0f\n", dbl - 1);
  printf("%.0f\n", dbl);
  printf("%.0f\n", dbl + 1);
  return 0;
}

Result

9007199254740991
9007199254740992
9007199254740992


The largest integer that can be represented in IEEE 754 double (64-bit) is the same as the largest value that the type can represent, since that value is itself an integer.

This is represented as 0x7FEFFFFFFFFFFFFF, which is made up of:

  • The sign bit 0 (positive) rather than 1 (negative)
  • The maximum exponent 0x7FE (2046 which represents 1023 after the bias is subtracted) rather than 0x7FF (2047 which indicates a NaN or infinity).
  • The maximum mantissa 0xFFFFFFFFFFFFF which is 52 bits all 1.

In binary, the value is the implicit 1 followed by another 52 ones from the mantissa, then 971 zeros (1023 - 52 = 971) from the exponent.

The exact decimal value is:

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

This is approximately 1.8 x 10308.


Wikipedia has this to say in the same context with a link to IEEE 754:

On a typical computer system, a 'double precision' (64-bit) binary floating-point number has a coefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit.

2^53 is just over 9 * 10^15.


You need to look at the size of the mantissa. An IEEE 754 64 bit floating point number (which has 52 bits, plus 1 implied) can exactly represent integers with an absolute value of less than or equal to 2^53.


1.7976931348623157 × 10^308

http://en.wikipedia.org/wiki/Double_precision_floating-point_format


It is true that, for 64-bit IEEE754 double, all integers up to 9007199254740992 == 2^53 can be exactly represented.

However, it is also worth mentioning that all representable numbers beyond 4503599627370496 == 2^52 are integers. Beyond 2^52 it becomes meaningless to test whether or not they are integers, because they are all implicitly rounded to a nearby representable value.

In the range 2^51 to 2^52, the only non-integer values are the midpoints ending with ".5", meaning any integer test after a calculation must be expected to yield at least 50% false answers.

Below 2^51 we also have ".25" and ".75", so comparing a number with its rounded counterpart in order to determine if it may be integer or not starts making some sense.

TLDR: If you want to test whether a calculated result may be integer, avoid numbers larger than 2251799813685248 == 2^51


UPDATE 1 :

just realized 5 ^ 1074 is NOT the true upper limit of what you can get for free out of IEEE 754 double-precision floating point, because I only counted denormalized exponents and forgot the fact the mantissa itself can fit another 22 powers of 5, so to the best of my understanding, the largest power of 5 one can get for free out of the double-precision format is ::

largest power of 5 :

  • 5 ^ 1096

largest odd number :

  • 5 ^ 1074 x 9007199254740991

  • 5 ^ 1074 x ( 2 ^ 53 - 1 )

mawk 'BEGIN { OFS = "\f\r\t";

 CONVFMT = "IEEE754 :: 4-byte word :: %.16lX"; 
   
 print "", 
 sprintf("%.*g", __=(_+=_+=_^=_<_)^++_+_*(_+_),
                ___=_=((_+_)/_)^-__),   (_ ""),
                        \
 sprintf("%.*g",__,_=_*((_+=(_^=!_)+(_+=_))*_\
                           )^(_+=_++)), (_ ""),
                           \
 sprintf("%.*g",__,_=___*=  \
        (_+=_+=_^=_<_)^--_^_/--_-+--_), (_ "") }'
  • 4.940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625e-324

      — IEEE754 :: 4-byte word :: 0000000000000001
    
    494065645841246544176568792......682506419718265533447265625 } 751 dgts :      
      5^1,074    
    
  • 1.1779442926436580280698985883431944188238616052015418158187524855152976686244219586021896275559329804892458073984282439492384355315111632261247033977765604928166883306272301781841416768261169960586755720044541328685833215865788678015827760393916926318959465387821953663477851727634395732669139543975751084522891987808004020022041120326339133484493650064495265010111570347355174765803347028811562651566216206901711944564705815590623254860079132843479610128658074120767908637153514231969910697784644086106916351461663273587631725676246505444808791274797874748064938487833137213363849587926231550453981511635715193075144590522172925785791614297511667878003519179715722536405560955202126362715257889359212587458533154881546706053453699158950485070818103849887847900390625e-308

      — IEEE754 :: 4-byte word :: 000878678326EAC9
    
    117794429264365802806989858......070818103849887847900390625 } 767 dgts :
      5^1,096
    
  • 4.4501477170144022721148195934182639518696390927032912960468522194496444440421538910330590478162701758282983178260792422137401728773891892910553144148156412434867599762821265346585071045737627442980259622449029037796981144446145705102663115100318287949527959668236039986479250965780342141637013812613333119898765515451440315261253813266652951306000184917766328660755595837392240989947807556594098101021612198814605258742579179000071675999344145086087205681577915435923018910334964869420614052182892431445797605163650903606514140377217442262561590244668525767372446430075513332450079650686719491377688478005309963967709758965844137894433796621993967316936280457084866613206797017728916080020698679408551343728867675409720757232455434770912461317493580281734466552734375e-308

      — IEEE754 :: 4-byte word :: 001FFFFFFFFFFFFF
    
    445014771701440227211481959......317493580281734466552734375 } 767 dgts :
          5^1,074
          6361
          69431
          20394401 
    

and here's a quick awk code snippet to print out every positive power of 2 up to 1023, every positive power of 5 up to 1096, and their common power of zero, optimized for both with and without a bigint library :

{m,g,n}awk' BEGIN {

 CONVFMT = "%." ((_+=_+=_^=_<_)*_+--_*_++)(!++_) "g"
    OFMT = "%." (_*_) "g"

 if (((_+=_+_)^_%(_+_))==(_)) {
    print __=_=\
            int((___=_+=_+=_*=++_)^!_)
     OFS = ORS
    while (--___) {
        print int(__+=__), int(_+=_+(_+=_))
    }
    __=((_+=_+=_^=!(__=_))^--_+_*_) substr("",_=__)
    do {
        print _+=_+(_+=_) } while (--__)
    exit
 } else { _=_<_ }

    __=((___=_+=_+=++_)^++_+_*(_+_--))
      _=_^(-(_^_--))*--_^(_++^_^--_-__)
  _____=-log(_<_)
    __^=_<_
   ___=-___+--___^___

 while (--___) {
     print ____(_*(__+=__+(__+=__))) }
 do {
     print ____(_) } while ((_+=_)<_____)
 }

 function ____(__,_) {
     return (_^=_<_)<=+__ \
     ?              sprintf( "%.f", __) \
     : substr("", _=sprintf("%.*g", (_+=++_)^_*(_+_),__),
         gsub("^[+-]*[0][.][0]*|[.]|[Ee][+-]?[[:digit:]]+$","",_))_
 }'

=============================

depends on how flexible you are with the definition of "represented" and "representable" -

Despite what typical literature says, the integer that's actually "largest" in IEEE 754 double precision, without any bigint library or external function call, with a completely full mantissa, that is computable, storable, and printable is actually :

9,007,199,254,740,991 * 5 ^ 1074 (~2546.750773909... bits)

  4450147717014402272114819593418263951869639092703291
  2960468522194496444440421538910330590478162701758282
  9831782607924221374017287738918929105531441481564124
  3486759976282126534658507104573762744298025962244902
  9037796981144446145705102663115100318287949527959668
  2360399864792509657803421416370138126133331198987655
  1545144031526125381326665295130600018491776632866075
  5595837392240989947807556594098101021612198814605258
  7425791790000716759993441450860872056815779154359230
  1891033496486942061405218289243144579760516365090360
  6514140377217442262561590244668525767372446430075513
  3324500796506867194913776884780053099639677097589658
  4413789443379662199396731693628045708486661320679701
  7728916080020698679408551343728867675409720757232455
  434770912461317493580281734466552734375

I used xxhash to compare this with gnu-bc and confirmed it's indeed identical and no precision lost. There's nothing "denormalized" about this number at all, despite the exponent range being labeled as such.

Try it on ur own system if u don't believe me. (I got this print out via off-the-shelf mawk) - and you can get to it fairly easily too :

  1. one(1) exponentiation/power (^ aka **) op,
  2. one(1) multiplication (*) op,
  3. one (1) sprintf() call, and
  4. either one(1) of — substr() or regex-gsub() to perform the cleanup necessary

Just like the 1.79…E309 number frequently mentioned,

  • both are mantissa limited
  • both are exponent limited
  • both have ridiculously large ULPs (unit in last place)
  • and both are exactly 1 step from "overwhelming" the floating point unit with either an overflow or underflow to give you back a usable answer

Negate the binary exponents of the workflow, and you can have the ops done entirely in this space, then just invert it once more at tail end of workflow to get back to the side what we typically consider "larger",

but keep in mind that in the inverted 
exponent realm, there's no "gradual overflow"

— The 4Chan Teller


As others has noted, I will assume that the OP asked for the largest floating-point value such that all whole numbers less than itself is precisely representable.

You can use FLT_MANT_DIG and DBL_MANT_DIG defined in float.h to not rely on the explicit values (e.g., 53):

#include <stdio.h>
#include <float.h>

int main(void)
{
    printf("%d, %.1f\n", FLT_MANT_DIG, (float)(1L << FLT_MANT_DIG));
    printf("%d, %.1lf\n", DBL_MANT_DIG, (double)(1L << DBL_MANT_DIG));
}

outputs:

24, 16777216.0
53, 9007199254740992.0


Doubles, the "Simple" Explanation

The largest "double" number (double precision floating point number) is typically a 64-bit or 8-byte number expressed as:

1.79E308
or
1.79 x 10 (to the power of) 308

As you can guess, 10 to the power of 308 is a GIGANTIC NUMBER, like 170000000000000000000000000000000000000000000 and even larger!

On the other end of the scale, double precision floating point 64-bit numbers support tiny tiny decimal numbers of fractions using the "dot" notation, the smallest being:

4.94E-324
or
4.94 x 10 (to the power of) -324

Anything multiplied times 10 to the power of a negative power is a tiny tiny decimal, like 0.0000000000000000000000000000000000494 and even smaller.

But what confuses people is they will hear computer nerds and math people say, "but that number has a range of only 15 numbers values". It turns out that the values described above are the all-time MAXIMUM and MINIMUM values the computer can store and present from memory. But they lose accuracy and the ability to create numbers LONG BEFORE they get that big. So most programmers AVOID the maximum double number possible, and try and stick within a known, much smaller range.

But why? And what is the best maximum double number to use? I could not find the answer reading dozens of bad explanations on math sites online. So this SIMPLE explanation may help you below. It helped me!!

DOUBLE NUMBER FACTS and FLAWS

JavaScript (which also uses the 64-bit double precision storage system for numbers in computers) uses double precision floating point numbers for storing all known numerical values. It thus uses the same MAX and MIN ranges shown above. But most languages use a typed numerical system with ranges to avoid accuracy problems. The double and float number storage systems, however, seem to all share the same flaw of losing numerical precision as they get larger and smaller. I will explain why as it affects the idea of "maximum" values...

To address this, JavaScript has what is called a Number.MAX_SAFE_INTEGER value, which is 9007199254740991. This is the most accurate number it can represent for Integers, but is NOT the largest number that can be stored. It is accurate because it guarantees any number equal to or less than that value can be viewed, calculated, stored, etc. Beyond that range, there are "missing" numbers. The reason is because double precision numbers AFTER 9007199254740991 use an additional number to multiple them to larger and larger values, including the true max number of 1.79E308. That new number is called an exponent.

THE EVIL EXPONENT

It happens to be the fact that this max value of 9007199254740991 is also the max number you can store in the 53 bits of computer memory used in the 64-bit storage system. This 9007199254740991 number stored in the 53-bits in memory is the largest value possible that can be stored directly in the mantissa section of memory of a typical double precision floating point number used by JavaScript.

9007199254740991, by-the-way, is in a format we call Base10 or decimal, the number Humans use. But it is also stored in computer memory as 53-bits as this value...

11111111111111111111111111111111111111111111111111111

This the maximum number of bits computers can actually store the integer part of double precision numbers using the 64-bit number memory system.

To get to the even LARGER max number possible (1.79E308), JavaScript has to use an extra trick called the exponent to multiple it to larger and larger values. So there is an 11-bit exponent number next to the 53-bit mantissa value in computer memory above that allows the number to grow much larger and much smaller, creating the final range of numbers double are expected to represent. (Also, there is a single bit for positive and negative numbers, as well.)

After the computer reaches this limit of max Integer value (around ~9 quadrillion) and filling up the mantissa section of memory with 53 bits, JavaScript uses a new 11-bit storage area for the exponent which allows much larger integers to grow (up to 10 to the power of 308!) and much smaller decimals to get smaller (10 to the power of -324!). Thus, this exponent number allows for a full range of large and small decimals to be created with the floating radix or decimal point to move up and down the number, creating the complex fractional or decimal values you expect to see. Again, this exponent is another large number store in 11-bits, and itself has a max value of 2048.

You will notice 9007199254740991 is a max integer, but does not explain the larger MAX value possible in storage or the MINIMUM decimal number, or even how decimal fractions get created and stored. How does this computer bit value create all that?

The answer is again, through the exponent!

It turns out that the exponent 11-bit value is divided itself into a positive and negative value so that it can create large integers but also small decimal numbers.

To do so, it has its own positive and negative range created by subtracting 1024 from its 2048 max value to get a new range of values from +1023 to -1023 (minus reserved values for 0) to create the positive/negative exponent range. To then get the FINAL DOUBLE NUMBER, the mantissa (9007199254740991) is multiplied by the exponent (plus the single bit sign added) to get the final value! This allows the exponent to multiply the mantissa value to even larger integer ranges beyond 9 quadrillion, but also go the opposite way with the decimal to very tiny fractions.

However, the -+1023 number stored in the exponent is not multiplied to the mantissa to get the double, but used to raise a number 2 to a power of the exponent. The exponent is a decimal number, but not applied to a decimal exponent like 10 to the power or 1023. It is applied to a Base2 system again and creates a value of 2 to the power of (the exponent number).

That value generated is then multiplied to the mantissa to get the MAX and MIN number allowed to be stored in JavaScript, as well as all the larger and smaller values within the range. It uses "2" rather than 10 for precision purposes, so with each increase in the exponent value, it only doubles the mantissa value. This reduces the loss of numbers. But this exponent multiplier also means it will lose an increasing range of numbers in doubles as it grows, to the point where as you reach the MAX stored exponent and mantissa possible, very large swaths of numbers disappear from the final calculated number, and so certain numbers are now not possible in math calculations!

That is why most use the SAFE max integer ranges (9007199254740991 or less), as most know very large and small numbers in JavaScript are highly inaccurate! Also note that 2 to the power of -1023 gets the MIN number or small decimal fractions you associate with a typical "float". The exponent is thus used to translate the mantissa integer to very large and small numbers up to the Maximum and Minimum ranges it can store.

Notice that the 2 to power of 1023 translates to a decimal exponent using 10 to the power of 308 for max values. That allows you to see the number in Human values, or Base10 numerical format of the binary calculation. Often math experts do not explain that all these values are the same number just in different bases or formats.

THE TRUE MAX FOR DOUBLES IS INFINITY

Finally, what happens when integers reach the MAX number possible, or the smallest decimal fraction possible?

It turns out, double precision floating point numbers have reserved a set of bit values for the 64-bit exponent and mantissa values to store four other possible numbers:

  1. +Infinity
  2. -Infinity
  3. +0
  4. -0

For example, +0 in double numbers stored in 64-bit memory is a large row of empty bits in computer memory. Below is what happens after you go beyond the smallest decimal possible (4.94E-324) in using a Double precision floating point number. It becomes +0 after it runs out of memory! The computer will return +0, but stores 0 bits in memory. Below is the FULL 64-bit storage design in bits for a double in computer memory. The first bit controls +(0) or -(1) for positive or negative numbers, the 11-bit exponent is next (all zeros is 0, so becomes 2 to the power of 0 = 1), and the large block of 53 bits for the mantissa or significand, which represents 0. So +0 is represented by all zeroes!

0 00000000000 0000000000000000000000000000000000000000000000000000

If the double reaches its positive max or min, or its negative max or min, many languages will always return one of those values in some form. However, some return NaN, or overflow, exceptions, etc. How that is handled is a different discussion. But often these four values are your TRUE min and max values for double. By returning irrational values, you at least have have a representation of the max and min in doubles that explain the last forms of the double type that cannot be stored or explained rationally.

SUMMARY

So the MAXIMUM and MINIMUM ranges for positive and negative Doubles are as follows:

MAXIMUM TO MINIMUM POSITIVE VALUE RANGE
1.79E308 to 4.94E-324 (+Infinity to +0 for out of range)

MAXIMUM TO MINIMUM NEGATIVE VALUE RANGE
-4.94E-324 to -1.79E308 (-0 to -Infinity for out of range)

But the SAFE and ACCURATE MAX and MIN range is really:
9007199254740991 (max) to -9007199254740991 (min)

So you can see with +-Infinity and +-0 added, Doubles have extra max and min ranges to help you when you exceed the max and mins.

As mentioned above, when you go from the largest positive value to smallest decimal positive value or fraction, the bits zero out and you get 0 Past 4.94E-324 the double cannot store any decimal fraction value smaller so it collapses to +0 in the bit registry. The same event happens for tiny negative decimals which collapse past their value to -0. As you know -0 = +0 so though not the same values stored in memory, in applications they often are coerced to 0. But be aware many applications do deliver signed zeros!

The opposite happens to the large values...past 1.79E308 they turn into +Infinity and -Infinity for the negative version. This is what creates all the weird number ranges in languages like JavaScript. Double precision numbers have weird returns!

Note that he MINIMUM SAFE RANGE for decimals/fractions is not shown above as it varies based on the precision needed in the fraction. When you combine the integer with the fractional part, the decimal place accuracy drops away quickly as it goes smaller. There are many discussions and debates about this online. No one ever has an answer. The list below might help. You might need to change these ranges listed to much smaller values if you want guaranteed precision. As you can see, if you want to support up to 9-decimal place accuracy in floats, you will need to limit MAX values in the mantissa to these values. Precision means how many decimal places you need with accuracy. Unsafe means past these values, the number will lose precision and have missing numbers:

            Precision   Unsafe 
            1           5,629,499,534,21,312
            2           703,687,441,770,664
            3           87,960,930,220,208
            4           5,497,558,130,888
            5           68,719,476,736
            6           8,589,934,592
            7           536,870,912
            8           67,108,864
            9           8,388,608

It took me awhile to understand the TRUE limits of Double precision floating point numbers and computers. I created this simple explanation above after reading so much MASS CONFUSION from math experts online who are great at creating numbers but terrible at explaining anything! I hope I helped you on your coding journey - Peace :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜