convert string to float without silent NaN/Inf conversion
I'd like convert strings to floats using Python 2.6 and later, but without silently converting things like 'NaN'
and 'Inf'
to float objects. I do want them to be silently ignored, as with any text that isn't valid as a float representation.
Before 2.6, float("NaN")
would raise a ValueError on Windows. Now it returns a float for which math.isnan() returns True, which is not useful behaviour for my application. (As was pointed out, this has always been a platform-dependent behaviour, but consider it an undesirable behaviour for my purposes, wherever it happens.)
Here's what I've got at the moment:
import math
def get_floats(source):
for text i开发者_C百科n source.split():
try:
val = float(text)
if math.isnan(val) or math.isinf(val):
raise ValueError
yield val
except ValueError:
pass
This is a generator, which I can supply with strings containing whitespace-separated sequences representing real numbers. I'd like it to yield only those fields which are purely numeric representations of floats, as in "1.23" or "-34e6", but not for example "NaN" or "-Inf". Things that aren't floats at all, e.g. "hello", should be ignored as well.
Test case:
assert list(get_floats('1.23 foo -34e6 NaN -Inf')) == [1.23, -34000000.0]
Please suggest alternatives you consider more elegant, even if they involve "look before you leap" (which is normally considered a lesser approach in Python).
Edited to clarify that non-float text such as "hello" should just be ignored quietly as well. The purpose is to pull out only those things that are real numbers and ignore everything else.
I'd write it like this. I think it combines conciseness with readability.
def is_finite(x):
return not math.isnan(x) and not math.isinf(x)
def get_floats(source):
for x in source.split():
try:
yield float(x)
except ValueError:
pass
def get_finite_floats(source):
return (x for x in get_floats(source) if is_finite(x))
This is a very minor suggestion, but continue
is a little faster than raising an exception:
def get_floats(source):
for text in source.split():
try:
val = float(text)
if math.isnan(val) or math.isinf(val): continue
yield val
except ValueError:
pass
Using raise ValueError
:
% python -mtimeit -s'import test' "list(test.get_floats('1.23 -34e6 NaN -Inf Hello'))"
10000 loops, best of 3: 22.3 usec per loop
Using continue
:
% python -mtimeit -s'import test' "list(test.get_floats_continue('1.23 -34e6 NaN -Inf Hello'))"
100000 loops, best of 3: 17.2 usec per loop
I voted up Paul Hankin's answer for readability, though if I don't want to split the code up as much here's a variation of my original that's less clunky.
def get_only_numbers(source):
'''yield all space-separated real numbers in source string'''
for text in source.split():
try:
val = float(text)
except ValueError:
pass # ignore non-numbers
else:
# "NaN", "Inf" get converted: explicit test to ignore them
if not math.isnan(val) and not math.isinf(val):
yield val
Still nothing far off what I originally had.
How about
for line in tf.readlines():
data =[]
for x in line.strip().split(','):
if x.replace('.','',1).isdecimal():
data.append(float(x))
精彩评论