开发者

How do I open all files of a certain type in Python and process them?

I'm trying to figure out how to make python go through a directory full of csv files, process each of the files and spit out a text file with a trimmed list of values.

In this example, I'm iterating through a CSV with lots of different types of columns but all I really want开发者_如何学编程 are the first name, last name, and keyword. I have a folder full of these csvs with different columns (except they all share first name, last name, and keyword somewhere in the csv). What's the best way to open that folder, go through each csv file, and then spit it all out as either its own csv file for just a text list as I have in the example below.

import csv
reader = csv.reader(open("keywords.csv"))
rownum = 0
headnum = 0
F = open('compiled.txt','w')
for row in reader:
    if rownum == 0:
        header = row;
        for col in row:
            if header[headnum]=='Keyword':
                keywordnum=headnum;
            elif header[headnum]=='First Name':
                firstnamenum=headnum;
            elif header[headnum]=='Last Name':
                lastnamenum=headnum;
            headnum +=1
    else:
        currentrow=row
        print(currentrow[keywordnum] + '\n' + currentrow[firstnamenum] + '\n' + currentrow[lastnamenum]) 
        F.write(currentrow[keywordnum] + '\n')

    rownum +=1


The best way is probably to use the shell's globbing ability, or alternatively the glob module of Python.

Shell (Linux, Unix)

Shell:

python myapp.py folder/*.csv

myapp.py:

import sys
for filename in sys.argv[1:]:
    with open(filename) as f:
        # do something with f

Windows (Or no shell available.)

import glob
for filename in glob.glob("folder/*.csv"):
    with open(filename) as f:
        # do something with f

Note: Python 2.5 needs from __future__ import with_statement


The "get all the CSV files" part of the question has been answered several times (including by the OP), but the "get the right named columns" hasn't yet: csv.DictReader makes it trivial -- the "process one CSV file" loop becomes just:

reader = csv.DictReader(open(thecsvfilename))
for row in reader:
    print('\n'.join(row['Keyword'], row['First Name'], row['Last Name'])) 
    F.write(row['Keyword'] + '\n')


A few suggestions:

  • You could keep the header indices for Keyword, First Name, and Last Name in a map instead of using separate variables. This would make it easier to modify the script later on.

  • You could use the list index() function instead of looping over the headers, e.g.:

    if rownum == 0:
        for header in ('Keyword', 'First Name', 'Last Name'):
            header_index[header] = row.index(header)
    

  • You could use the glob module to grab the filenames, but gs is probably right that shell globbing is a better way to do it.

  • It might be better to use the csv module for writing the file as well; I think it handles escaping, so it would probably be more robust.


I think the best way to process a bunch of files in a directory is with os.walk (documented in the Python os module docs here.

Here is an answer I wrote to another Python question, which includes working tested Python code to use os.walk to open a bunch of files. This version visits all subdirectories too, but it would be easy to modify it to just stay in the one directory.

Replace strings in files by Python


And I've answered my own question again... I imported the os and glob modules to nab a path.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜