Help with an if else loop in python
Hi here is my problem. I have a program that calulcates the averages of data in columns. Example
Bob
1
2
3
the output is
Bob
2
Some of the data has 'na's So for Joe
Joe
NA
NA
NA
I want this output to be NA
so I wrote an if else loop
The problem is that it doesn't 开发者_StackOverflow中文版execute the second part of the loop and just prints out one NA. Any suggestions?
Here is my program:
with open('C://achip.txt', "rtU") as f:
columns = f.readline().strip().split(" ")
numRows = 0
sums = [0] * len(columns)
numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
for line in f:
# Skip empty lines since I was getting that error before
if not line.strip():
continue
values = line.split(" ")
for i in xrange(len(values)):
try: # this is the whole strings to math numbers things
sums[i] += float(values[i])
numRowsPerColumn[i] += 1
except ValueError:
continue
with open('c://chipdone.txt', 'w') as ouf:
for i in xrange(len(columns)):
if numRowsPerColumn[i] ==0 :
print 'NA'
else:
print>>ouf, columns[i], sums[i] / numRowsPerColumn[i] # this is the average calculator
The file looks like so:
Joe Bob Sam
1 2 NA
2 4 NA
3 NA NA
1 1 NA
and final output is the names and the averages
Joe Bob Sam
1.5 1.5 NA
Ok I tried Roger's suggestion and now I have this error:
Traceback (most recent call last): File "C:/avy14.py", line 5, in for line in f: ValueError: I/O operation on closed file
Here is this new code:
with open('C://achip.txt', "rtU") as f: columns = f.readline().strip().split(" ") sums = [0] * len(columns) rows = 0 for line in f: line = line.strip() if not line: continue
rows += 1 for col, v in enumerate(line.split()): if sums[col] is not None: if v == "NA": sums[col] = None else: sums[col] += int(v)
with open("c:/chipdone.txt", "w") as out: for name, sum in zip(columns, sums): print >>out, name, if sum is None: print >>out, "NA" else: print >>out, sum / rows
with open("c:/achip.txt", "rU") as f:
columns = f.readline().strip().split()
sums = [0.0] * len(columns)
row_counts = [0] * len(columns)
for line in f:
line = line.strip()
if not line:
continue
for col, v in enumerate(line.split()):
if v != "NA":
sums[col] += int(v)
row_counts[col] += 1
with open("c:/chipdone.txt", "w") as out:
for name, sum, rows in zip(columns, sums, row_counts):
print >>out, name,
if rows == 0:
print >>out, "NA"
else:
print >>out, sum / rows
I'd also use the no-parameter version of split when getting the column names (it allows you to have multiple space separators).
Regarding your edit to include input/output sample, I kept your original format and my output would be:
Joe 1.75 Bob 2.33333333333 Sam NA
This format is 3 rows of (ColumnName, Avg) columns, but you can change the output if you want, of course. :)
Using numpy:
import numpy as np
with open('achip.txt') as f:
names=f.readline().split()
arr=np.genfromtxt(f)
print(arr)
# [[ 1. 2. NaN]
# [ 2. 4. NaN]
# [ 3. NaN NaN]
# [ 1. 1. NaN]]
print(names)
# ['Joe', 'Bob', 'Sam']
print(np.ma.mean(np.ma.masked_invalid(arr),axis=0))
# [1.75 2.33333333333 --]
Using your original code, I would add one loop and edit the print statement
with open(r'C:\achip.txt', "rtU") as f:
columns = f.readline().strip().split(" ")
numRows = 0
sums = [0] * len(columns)
numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
for line in f:
# Skip empty lines since I was getting that error before
if not line.strip():
continue
values = line.split(" ")
### This removes any '' elements caused by having two spaces like
### in the last line of your example chip file above
for count, v in enumerate(values):
if v == '':
values.pop(count)
### (End of Addition)
for i in xrange(len(values)):
try: # this is the whole strings to math numbers things
sums[i] += float(values[i])
numRowsPerColumn[i] += 1
except ValueError:
continue
with open('c://chipdone.txt', 'w') as ouf:
for i in xrange(len(columns)):
if numRowsPerColumn[i] ==0 :
print>>ouf, columns[i], 'NA' #Just add the extra parts
else:
print>>ouf, columns[i], sums[i] / numRowsPerColumn[i]
This solution also gives the same result in Roger's format, not your intended format.
Solution below is cleaner and has fewer lines of code ...
import pandas as pd
# read the file into a DataFrame using read_csv
df = pd.read_csv('C://achip.txt', sep="\s+")
# compute the average of each column
avg = df.mean()
# save computed average to output file
avg.to_csv("c:/chipdone.txt")
They key to the simplicity of this solution is the way the input text file is read into a Dataframe. Pandas read_csv allows you to use regular expressions for specifying the sep/delimiter argument. In this case, we used the "\s+" regex pattern to take care of having one or more spaces between columns.
Once the data is in a dataframe, computing the average and saving to a file can all be done with straight forward pandas functions.
精彩评论