Convert decimal mark when reading numbers as input
I have a CSV file with data reading that I want to read into Python. I get lists that contain strings like "2,5"
. Now doing float("2,5")
does not work, because it has the wrong decimal mark.
How do I read thi开发者_JAVA百科s into Python as 2.5
?
You may do it the locale-aware way:
import locale
# Set to users preferred locale:
locale.setlocale(locale.LC_ALL, '')
# Or a specific locale:
locale.setlocale(locale.LC_NUMERIC, "en_DK.UTF-8")
print locale.atof("3,14")
Read this section before using this method.
float("2,5".replace(',', '.'))
will do in most cases
If value
is a large number and .
has been used for thousands, you can:
Replace all commas for points: value.replace(",", ".")
Remove all but the last point: value.replace(".", "", value.count(".") -1)
Pandas supports this out of the box:
df = pd.read_csv(r'data.csv', decimal=',')
See http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
using a regex will be more reliable
import re
decmark_reg = re.compile('(?<=\d),(?=\d)')
ss = 'abc , 2,5 def ,5,88 or (2,5, 8,12, 8945,3 )'
print ss
print decmark_reg.sub('.',ss)
result
abc , 2,5 def ,5,88 or (2,5, 8,12, 8945,3 )
abc , 2.5 def ,5.88 or (2.5, 8.12, 8945.3 )
If you want to treat more complex cases (numbers with no digit before the decimal mark for exemple) the regex I crafted to detect all types of numbers in the following thread may be of interest for you:
stackoverflow.com/questions/5917082/regular-expression-to-match-numbers-with-or-without-commas-and-decimals-in-text/5929469
Try replacing all the decimal commas with decimal dots:
floatAsStr = "2,5"
floatAsStr = floatAsStr.replace(",", ".");
myFloat = float(floatAsStr)
The function replace
, of course, work on any substring as python does now differentiate between char and string.
First you must ensure what locale was used to provide the number. Failing to do this random problems surely will occur.
import locale
loc = locale.getlocale() # get and save current locale
# use locale that provided the number;
# example if German locale was used:
locale.setlocale(locale.LC_ALL, 'de_DE')
pythonnumber = locale.atof(value)
locale.setlocale(locale.LC_ALL, loc) # restore saved locale
if dots are used as thousand separators, to swap commas and dots you could use a third symbol as temporary placeholder like so:
value.replace('.', '#').replace(',', '.').replace('#', ',')
but seeing as you want to convert to float from string, you could just remove any dots and then replace any commas with dots
float(value.replace('.', '').replace(',', '.'))
IMO this is the most readable solution
I have an application (not under my control) in which the incoming monetary value can be in any of the two formats, at least while we convince the customer to change this. There is ambiguity when a single separator is provided: 1,039 can mean 1.036 or 1036 (thousand and ...) but in practice since it is money that is represented, over 2 characters behind the separator are assumed to be not-decimals.
Below is this code:
def tolerant_monetary_float (x, max_decimals = 2):
num_dot = x.count ('.')
num_com = x.count (',')
if not num_dot:
# no dot
if not num_com:
# no dot, no comma
return float (x)
if num_com > 1:
# more than one comma
return float (x.replace (',', ''))
# 1 comma: its ambiguous: 1,000 can mean 1000 or 1.0
if len (x) - x.find (',') -1 <= max_decimals:
# assume the comma is decimal separator
return float (x.replace (',', '.'))
# assume comma is thousand separator
return float (x.replace (',', ''))
if not num_com:
# no comma
if not num_dot:
# no dot, no comma
return float (x)
if num_dot > 1:
# more than one dot
return float (x.replace ('.', ''))
# 1 dot: its ambiguous: 1.000 can mean 1000 or 1.0
if len (x) - x.find ('.') -1 <= max_decimals:
# assume the dot is decimal separator
return float (x)
# assume dot is thousand separator
return float (x.replace ('.', ''))
# mix of dots and commas
if num_dot > 1 and num_com > 1:
return ValueError (f'decimal number cannot have a mix of "," and ".": {x}')
ix_dot = x.find ('.')
ix_com = x.find (',')
if ix_dot < ix_com:
# dot is before comma: 1.000,35
return float (x.replace ('.', '').replace (',', '.'))
# comma is before dot: 1,000.35
return float (x.replace (',', ''))
if __name__ == "__main__":
assert (tolerant_monetary_float ('1') == 1.0)
assert (tolerant_monetary_float ('1.2345') == 12345.0)
assert (tolerant_monetary_float ('1.234') == 1234.0)
assert (tolerant_monetary_float ('1.23') == 1.23)
assert (tolerant_monetary_float ('1.2') == 1.2)
assert (tolerant_monetary_float ('1,2345') == 12345.0)
assert (tolerant_monetary_float ('1,234') == 1234.0)
assert (tolerant_monetary_float ('1,23') == 1.23)
assert (tolerant_monetary_float ('1,2') == 1.2)
assert (tolerant_monetary_float ('1234,5') == 1234.5)
assert (tolerant_monetary_float ('1.234,5') == 1234.5)
assert (tolerant_monetary_float ('1,234.5') == 1234.5)
assert (tolerant_monetary_float ('1,234,567.85') == 1234567.85)
assert (tolerant_monetary_float ('1.234.567,85') == 1234567.85)
精彩评论