Replacing empty csv column values with a zero
So I'm dealing with a csv file that has missing values. What I want my script to is:
#!/usr/bin/python
import csv
import sys
#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for x in row[:]:
if len(x)< 1:
x = 0
print x
print row
Here is an example of data, I trying it on, ideally it should work on any column lenghth
Before:
actnum,col2,col4
xxxxx , ,
xxxxx , 845 ,
xxxxx , ,545
After
actnum,col2,col4
xxxxx , 0 , 0
xxxxx , 845, 0
xxxxx , 0 ,545
Any guidance would be appreciated
Update Here is what I have now (thanks):
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
print row
However it only seems to out put one record, I will be piping the output to a new file on the command line.
Update 3: Ok now I have the opposite problem, I'm outputting duplicates of each records. Why is that happening?
After
actnum,col2,col4
actnum,col2,col4
xxxxx , 0 , 0
xxxxx , 0 , 0
xxxxx , 845, 0
xxxxx , 845, 0
xxxxx , 0 ,545
xxxxx , 0 ,545
Ok I fi开发者_开发知识库xed it (below) thanks you guys for your help.
#!/usr/bin/python
import csv
import sys
#1. Place each record of a file in a list.
#2. Iterate thru each element of the list and get its length.
#3. If the length is less than one replace with value x.
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
print ','.join(str(x) for x in row)
Change your code:
for row in reader:
for x in row[:]:
if len(x)< 1:
x = 0
print x
into:
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
print x
Not sure what you think you're accomplishing by the print
, but the key issue is that you need to modify row
, and for that purpose you need an index into it, which enumerate
gives you.
Note also that all other values, except the empty ones which you're changing into the number 0
, will remain strings. If you want to turn them into int
s you have to do that explicitly.
You are very nearly there!
There are just a couple of small bugs.
len(x)< 1
will not work for the second column in the second row of your data becausex
will contain' '
(and have a length > 1). You'll need tostrip
your strings.print row
will probably print an empty list because you've finished iterating. You can probably just remove this line.
Also: Are you trying to modify the file or just output the corrections to pipe to some other file or process?
精彩评论