How to delete columns in a CSV file?
I have been able to create a csv with python using the input from several users on this site and I wish to express my gratitude for your posts. I am now st开发者_运维百科umped and will post my first question.
My input.csv looks like this:
day,month,year,lat,long
01,04,2001,45.00,120.00
02,04,2003,44.00,118.00
I am trying to delete the "year" column and all of its entries. In total there is 40+ entries with a range of years from 1960-2010.
import csv
with open("source","rb") as source:
rdr= csv.reader( source )
with open("result","wb") as result:
wtr= csv.writer( result )
for r in rdr:
wtr.writerow( (r[0], r[1], r[3], r[4]) )
BTW, the for
loop can be removed, but not really simplified.
in_iter= ( (r[0], r[1], r[3], r[4]) for r in rdr )
wtr.writerows( in_iter )
Also, you can stick in a hyper-literal way to the requirements to delete a column. I find this to be a bad policy in general because it doesn't apply to removing more than one column. When you try to remove the second, you discover that the positions have all shifted and the resulting row isn't obvious. But for one column only, this works.
del r[2]
wtr.writerow( r )
Use of Pandas module will be much easier.
import pandas as pd
f=pd.read_csv("test.csv")
keep_col = ['day','month','lat','long']
new_f = f[keep_col]
new_f.to_csv("newFile.csv", index=False)
And here is short explanation:
>>>f=pd.read_csv("test.csv")
>>> f
day month year lat long
0 1 4 2001 45 120
1 2 4 2003 44 118
>>> keep_col = ['day','month','lat','long']
>>> f[keep_col]
day month lat long
0 1 4 45 120
1 2 4 44 118
>>>
Using a dict to grab headings then looping through gets you what you need cleanly.
import csv
ct = 0
cols_i_want = {'cost' : -1, 'date' : -1}
with open("file1.csv","rb") as source:
rdr = csv.reader( source )
with open("result","wb") as result:
wtr = csv.writer( result )
for row in rdr:
if ct == 0:
cc = 0
for col in row:
for ciw in cols_i_want:
if col == ciw:
cols_i_want[ciw] = cc
cc += 1
wtr.writerow( (row[cols_i_want['cost']], row[cols_i_want['date']]) )
ct += 1
You can directly delete the column with just
del variable_name['year']
I would use Pandas with col number
f = pd.read_csv("test.csv", usecols=[0,1,3,4])
f.to_csv("test.csv", index=False)
you can use the csv
package to iterate over your csv file and output the columns that you want to another csv file.
The example below is not tested and should illustrate a solution:
import csv
file_name = 'C:\Temp\my_file.csv'
output_file = 'C:\Temp\new_file.csv'
csv_file = open(file_name, 'r')
## note that the index of the year column is excluded
column_indices = [0,1,3,4]
with open(output_file, 'w') as fh:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
tmp_row = []
for col_inx in column_indices:
tmp_row.append(row[col_inx])
fh.write(','.join(tmp_row))
Off the top of my head, this will do it without any sort of error checking nor ability to configure anything. That is "left to the reader".
outFile = open( 'newFile', 'w' )
for line in open( 'oldFile' ):
items = line.split( ',' )
outFile.write( ','.join( items[:2] + items[ 3: ] ) )
outFile.close()
I will add yet another answer to this question. Since the OP did not say they needed to do it with Python, the fastest way to delete the column (specially when the input file has hundreds of thousands of lines), is by using awk
.
This is the type of problem where awk shines:
$ awk -F, 'BEGIN {OFS=","} {print $1,$2,$4,$5}' input.csv
(feel free to append > output.csv
to the command above if you need the output to be saved to a file)
Credit goes 100% to @eric-wilson who provided this awesome answer, as a comment on the original question, 10 years ago, almost without any credit.
Try:
result= data.drop('year', 1)
result.head(5)
It depends on how you store the parsed CSV, but generally you want the del operator.
If you have an array of dicts:
input = [ {'day':01, 'month':04, 'year':2001, ...}, ... ]
for E in input: del E['year']
If you have an array of arrays:
input = [ [01, 04, 2001, ...],
[...],
...
]
for E in input: del E[2]
Try python with pandas and exclude the column, you don't want to have:
import pandas as pd
# the ',' is the default separator, but if your file has another one, you have to define it with sep= parameter
df = pd.read_csv("input.csv", sep=',')
exclude_column = "year"
new_df = df.loc[:, df.columns != exclude_column]
# you can even save the result to the same file
new_df.to_csv("input.csv", index=False, sep=',')
My take using pandas's drop
in python:
import pandas as pd
df = pd.read_csv("old.csv")
new_df = df.drop("year", axis=1)
new_df.to_csv("new.csv", index=False)
精彩评论