Pitfalls of number values in Python, "How deep?"
I'm a fairly green programmer, and I'm learning Python right now. I'm up to chapter 17 in "Learn to Think Like a Computer Scientist" (Classes and Methods), and I just wrote my first doctest that failed in a way I truly do not fully understand:
class Point(object):
'''
represents a point object.
attributes: x, y
'''
def ___init___(self, x = 0, y = 0):
'''
>>> point = Point()
>>> point.y
0
>>> point = Point(4.7, 8.2)
>>> point.x
4.7
'''
self.x = x
self.y = y
The second doctest for __init__
fails, and returns 4.7000000000000002 instead of 4.7. However, if I rewrite the doctest with a "print" statement as so:
>>> point = Point(4.7, 8.2)
>>> print point.x
4.7
It runs correctly.
So I read up on how Python stores floats, and I now understand that, due to binary representation of decimal numbers, the reason for the discrepancy is that Python stores 4.7 as a string of 1s and 0s that almost but don't quite equal 4.7.
But what I don't understand is why a call to "point.x" returns 4.7000000000000002 and a call to "print point.x" returns 4.7. Under what other circumstances will Python choose to round like it does with "print"? How does this rounding work? Can these trailing significant figures lead to errors in programming (aside from, obviously, failed doc开发者_运维百科tests)? Can a failure to pay attention to rounding create dangerous ambiguity?
Since this has to do with binary representation of decimal numbers, I'm sure that this is in fact a general CS issue and not one specific to Python, but what I really need to know right now is what I can do, specifically as a Python programmer, to avoid any related issues and/or bug infestations.
Also, for bonus points, is there some other way that Python can store floating point numbers aside from the default activated by a line like "a = 4.7"? I know there's the Decimal package, but I'm not totally sure how it works. Honestly, all of this dynamic typing stuff confuses me sometimes.
Edit: I should specify that I'm using Python 2.6 (at some point I want to use NumPy and Biopython)
>>> point.x
calls repr
function which is for string representation holding more technical information than str
function, which is called when
>>> print point.x
occurs
This has to do with how computers store floating point numbers. A detailed description of this is here. However, for your case, the quick solution is to check not the printed representation of point.x
but if point.x
is equal to 4.7
. So...
>>> point = Point(4.7, 8.2)
>>> point.x == 4.7
True
Or better:
>>> point = Point(4.7, 8.2)
>>> eps = 2**-53 #get epsilon for standard double precision number
>>> -eps <= point.x - 4.7 <= eps
True
Where eps
is the maximum value for rounding errors in floating-point arithmetic. For details on epsilon, see here.
EDIT: -eps <= point.x - 4.7 <= eps
is equivalent to abs(point.x - 4.7) <= eps
. I only add this because not everyone is familiar with Python's chaining of comparison operators.
EDIT 2: Since you mentioned numpy, numpy has a method to get the eps without calculating it yourself. Use eps = numpy.finfo(float).eps
instead of 2**-53
if you're using numpy. Note that the numpy epsilon is for some reason bigger than it should be and is equal to 2**-52
rather than 2**-53
. I have no idea why this is.
When working with floating point numbers, the common approach goes like this:
a == b if abs(a-b) <= eps, where eps is the required precision.
In programming contests, eps is given along with the problem to solve. My advice is to establish an accuracy that you need for your stuff, and use it
You get a different behavior because print
truncates numbers:
In [1]: 1.23456789012334
Out[1]: 1.23456789012334
In [2]: print 1.23456789012334
1.23456789012
Note, at the precision used in Python's floats:
In [3]: 4.7 == 4.7000000000000002
Out[3]: True
This is because floats have a limited (relative) precision because they use a finite number of (binary) digits to represent real numbers. Thus, as above, different decimal representations of a given number can actually be equal for Python, after being approximated by the closest float. This is a general property of floating point numbers.
This comprehensive guide explains everything.
Here are Python-specific explanations.
精彩评论