开发者

Add file name as last column of CSV file

I have a Python script which modifies a CSV file to add the filename as the last column:

import sys
import glob

for filenam开发者_JAVA技巧e in glob.glob(sys.argv[1]):
    file = open(filename)
    data = [line.rstrip() + "," + filename for line in file]
    file.close()

    file = open(filename, "w")
    file.write("\n".join(data))
    file.close()

Unfortunately, it also adds the filename to the header (first) row of the file. I would like the string "ID" added to the header instead. Can anybody suggest how I could do this?


Have a look at the official csv module.


Here are a few minor notes on your current code:

  • It's a bad idea to use file as a variable name, since that shadows the built-in type.
  • You can close the file objects automatically by using the with syntax.
  • Don't you want to add an extra column in the header line, called something like Filename, rather than just omitting a column in the first row?
  • If your filenames have commas (or, less probably, newlines) in them, you'll need to make sure that the filename is quoted - just appending it won't do.

That last consideration would incline me to use the csv module instead, which will deal with the quoting and unquoting for you. For example, you could try something like the following code:

import glob
import csv
import sys

for filename in glob.glob(sys.argv[1]):
    data = []
    with open(filename) as finput:
        for i, row in enumerate(csv.reader(finput)):
            to_append = "Filename" if i == 0 else filename
            data.append(row+[to_append])
    with open(filename,'wb') as foutput:
        writer = csv.writer(foutput)
        for row in data:
            writer.writerow(row)

That may quote the data slightly differently from your input file, so you might want to play with the quoting options for csv.reader and csv.writer described in the documentation for the csv module.

As a further point, you might have good reasons for taking a glob as a parameter rather than just the files on the command line, but it's a bit surprising - you'll have to call your script as ./whatever.py '*.csv' rather than just ./whatever.py *.csv. Instead, you could just do:

for filename in sys.argv[1:]:

... and let the shell expand your glob before the script knows anything about it.

One last thing - the current approach you're taking is slightly dangerous, in that if anything fails when writing back to the same filename, you'll lose data. The standard way of avoiding this is to instead write to a temporary file, and, if that was successful, rename the temporary file over the original. So, you might rewrite the whole thing as:

import csv
import sys
import tempfile
import shutil

for filename in sys.argv[1:]:
    tmp = tempfile.NamedTemporaryFile(delete=False)
    with open(filename) as finput:
        with open(tmp.name,'wb') as ftmp:
            writer = csv.writer(ftmp)
            for i, row in enumerate(csv.reader(finput)):
                to_append = "Filename" if i == 0 else filename
                writer.writerow(row+[to_append])
    shutil.move(tmp.name,filename)


You can try:

data = [file.readline().rstrip() + ",id"]
data += [line.rstrip() + "," + filename for line in file]


You can try changing your code, but using the csv module is recommended. This should give you the result you want:

import sys
import glob
import csv

filename = glob.glob(sys.argv[1])[0]
yourfile = csv.reader(open(filename, 'rw'))

csv_output=[]

for row in yourfile:
    if len(csv_output) != 0:     # skip the header
        row.append(filename)
    csv_output.append(row)

yourfile = csv.writer(open(filename,'w'),delimiter=',')
yourfile.writerows(csv_output)


Use the CSV module that comes with Python.

import csv
import sys

def process_file(filename):
    # Read the contents of the file into a list of lines.
    f = open(filename, 'r')
    contents = f.readlines()
    f.close()

    # Use a CSV reader to parse the contents.
    reader = csv.reader(contents)

    # Open the output and create a CSV writer for it.
    f = open(filename, 'wb')
    writer = csv.writer(f)

    # Process the header.
    header = reader.next()
    header.append('ID')
    writer.writerow(header)

    # Process each row of the body.
    for row in reader:
        row.append(filename)
        writer.writerow(row)

    # Close the file and we're done.
    f.close()

# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])

You can run this as follows:

blair@blair-eeepc:~$ python csv_add_filename.py file1.csv file2.csv


you can use fileinput to do in place editing

import sys
import glob
import fileinput

for filename in glob.glob(sys.argv[1]):
    for line in fileinput.FileInput(filename,inplace=1) :
       if fileinput.lineno()==1:
          print line.rstrip() + " ID"
       else
          print line.rstrip() + "," + filename
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜