Using csv modele to extract specific lines of text from a larger file
So I'm extracting the lines that I want from this larger file using this program:
import csv
name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']
data = csv.reader(open('C:\\bigfile.csv'))
with open('C:\\smalldataset.xcl','w') as outf:
csv.writer(outf).writerows(l for l in data if l[0] in name)
The program runs. However I am only getting the 开发者_Python百科line of data from NAMETHEFIRST
and I get no data from NAMETHEOTHERNAME
written to my small dataset file. This works exactly as I want printing all relevant info from the large data set of the line of data for NAME THE FIRST but i get no information from the second nametheother name written to the smaller file. Why isn't this working?
This is a list with one string:
['NAMETHEFIRST,' 'NAMEANOTHERNAME ']
This is a list with two strings:
['NAMETHEFIRST', 'NAMEANOTHERNAME ']
Note the placement of the comma.
Also note that your second string has a space at the end.
This line of code
name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']
is equivalent to
name = ['NAMETHEFIRST,NAMEANOTHERNAME ']
because Python follows C in concatenating adjacent string constants at compile time.
You say """I am only getting the line of data from NAMETHEFIRST and I get no data from NAMETHEOTHERNAME written to my small dataset file""" -- however the code that you show will NOT produce that result; it will select only lines that start with
"NAMETHEFIRST,NAMEANOTHERNAME ",
You will get the stated result only if that line is actually:
name = ['NAMETHEFIRST', 'NAMEANOTHERNAME ']
and that is presumably because the second name in the file doesn't have a trailing space as above.
Other problems:
csv.writer(outf).writerows(l for l in data if l[0] in name)
is trying to be a bit too clever. If you break it down into bite-size chunks, you can much more easily use a debugger or just print statements to show you what is actually happening.
Try this:
print len(name), name
data = csv.reader(open('C:\\bigfile.csv', 'rb')) # ALWAYS open csv files in BINARY mode
with open('C:\\smalldataset.xcl','wb') as outf: # ALWAYS open csv files in BINARY mode
writer = csv.writer(outf)
for row_index, row in enumerate (data): # don't use 'l' as a variable name
print row_index + 1, row
if row[0] in name:
writer.writerow(row)
精彩评论