How to sum all the numbers of one column in python?
I need to know how to sum all the numbers of a column in a CSV file.
For example. My data looks like this:
column count min max sum mean
80 29573061 2 40 855179253 28.92
81 28861459 2 40 802912711 27.82
82 28165830 2 40 778234605 27.63
83 27479902 2 40 754170015 27.44
84 26800815 2 40 729443846 27.22
85 26127825 2 40 701704155 26.86
86 25473985 2 40 641663075 25.19
87 24827383 2 40 621981569 25.05
88 24189811 2 40 602566423 24.91
89 23566656 2 40 579432094 24.59
90 22975910 2 40 553092863 24.07
91 22412345 2 40 492993262 22
92 21864206 2 40 475135290 21.73
93 21377772 2 40 461532152 21.59
94 20968958 2 40 443921856 21.17
95 20593463 2 40 424887468 20.63
96 20329969 2 40 364319592 17.92
97 20157643 2 40 354989240 17.61
98 20104046 2 40 349594631 17.39
99 20103866 2 40 342152213 17.02
100 20103866 2 40 335379448 16.6
#But it's separated by tabs
The code I've write so far is:
import sys
import csv
def ErrorCalculator(file):
reader = csv.reader(open(file), dialect='excel-tab' )
开发者_运维技巧 for row in reader:
PxCount = 10**(-float(row[5])/10)*float(row[1])
if __name__ == '__main__':
ErrorCalculator(sys.argv[1])
For this particular code I need to sum all the numbers in PxCount and divide by the sum of all numbers in row[1]...
I'll be so grateful if tell me how to sum the numbers of a column or if you help me with this code.
Also if you can give me a tip to skip the header.
You can call "reader.next()" right after instantiating the reader to discard the first line.
To sum the PxCount, just set sum = 0
before your loop and sum += PxCount
after you calculate it for each row.
PS You might find the csv.DictReader helpful too.
You could keep a running total using an "augmented assignment" +=:
total=0
for row in reader:
PxCount = 10**(-float(row[5])/10)*float(row[1])
total+=PxCount
To skip the first line (header) in the csv file:
with open(file) as f:
next(f) # read and throw away first line in f
reader = csv.reader(f, dialect='excel-tab' )
Using a DictReader
will result in far clearer code. Decimal
will give you better precision. Also try to follow python naming conventions and use lowercase names for functions and variables.
import decimal
def calculate(file):
reader = csv.DictReader(open(file), dialect='excel-tab' )
total_count = 0
total_sum = 0
for row in reader:
r_count = decimal.Decimal(row['count'])
r_sum = decimal.Decimal(row['sum'])
r_mean = decimal.Decimal(row['mean'])
# not sure if the below formula is actually what you want
total_count += 10 ** (-r_mean / 10) * r_count
total_sum += r_sum
return total_count / total_sum
精彩评论