In Python, removing thousands comma from numbers in a list where the numbers are separated by commas
I have a list of data similar to that below:
a = ['"105', '424"', '"102', '629"', '"104', '307"']
I want th开发者_如何学编程is data to be in a form similar to that of below:
a = ['105424', '102629', '104307']
I am unsure of how to proceed. I thought perhaps removing all the commas then inserting commas only where they should be and then removing the quotations. I am finding this to be quite challenging.
I'm assuming this data was originally in a csv file where data that contains commas is quoted ("105,424","102,629","104,307") and then you are splitting on comma:
>>> '"105,424","102,629","104,307"'.split(',')
['"105', '424"', '"102', '629"', '"104', '307"']
Rather you should let the csv
module do the work as it will handle the double quotes:
import csv
with open('u:\\foobar.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print [x.replace(',','') for x in row]
This prints: ['105424', '102629', '104307']
Does your data look something like:
"123", "123,456", "123,456,789"
If so then try this
input = '"123", "123,456", "123,456,789"'
import re
reg = re.compile('"(\d{1,3}(,\d{3})*)"')
stringValues = [wholematch.replace(',', '') for wholematch, _endmatch
in reg.findall(input)]
This regex should also work on thousands with decimal places as well.
re.compile('"(\d{1,3}(,\d{3})*(\.\d*)?)"')
If the source data is CSV, you should use @steven's answer.
Regardless, here's how you could process what you pasted.
As @troutwine stated, this will only work if the number parts are always in pairs.
a = ['"105', '424"', '"102', '629"', '"104', '307"']
from itertools import izip
def pairwise(iterable):
"s -> (s0,s1), (s2,s3), (s4, s5), ..."
a = iter(iterable)
return izip(a, a)
result = []
for x, y in pairwise(a):
result.append(''.join([x, y]).strip('"'))
print result
Gives:
['105424', '102629', '104307']
Pairwise snippet from here: Iterating over every two elements in a list
If you'll never have an unmatched pair, loop over a range 1/2 the size of the input list, mash the current index plus the next together, do a string substitution and skip to the current index plus two.
Reduce to the rescue:
l = ['"105', '424"', '"102', '629"', '"104', '307"', '"123', '456', '789"', '"123"']
# Concatenate everything and split by ", get non-empties
l2 = [num for num in reduce(lambda x, y: x+y, l).split('"') if num != '']
# Output:
# ['105424', '102629', '104307', '123456789', '123']
print l2
Few caveats though: This code can do numbers beyond thousands (ie, 1,457,664), but also assumes that the whole number was double-quoted.
As others have said though, you should revisit your data retrieval as there are most likely ways to get the values correctly without dealing with the double-quotes. This was a fun little challenge nonetheless.
精彩评论