How two merge several .csv files horizontally with python?
I've several .csv files (~10) and need to merge them together into a single file horizontally. Each file has the same number of rows (~300) and 4 header lines which are not necessarily identical, but should not be merged (only take the header lines from the first .csv file). The tokens in the lines are comma separated with no spaces in between.
As a python noob I've not come up with a 开发者_如何学编程solution, though I'm sure there's a simple solution to this problem. Any help is welcome.
You can load the CSV files using the csv
module in Python. Please refer to the documentation of this module for the loading code, I cannot remember it but it is really easy. Something like:
import csv
reader = csv.reader(open("some.csv", "rb"))
csvContent = list(reader)
After that, when you have the CSV files loaded in such form (a list of tuples):
[ ("header1", "header2", "header3", "header4"),
("value01", "value12", "value13", "value14"),
("value11", "value12", "value13", "value14"),
...
]
You can merge two such lists line-by-line:
result = [a+b for (a,b) in zip(csvList1, csvList2)]
To save such a result, you can use:
writer = csv.writer(open("some.csv", "wb"))
writer.writerows(result)
The csv module is your friend.
If you don't necessarily have to use Python, you can use shell tools like paste/gawk
etc
$ paste file1 file2 file3 file4 .. | awk 'NR>4'
The above will put them horizontally without the headers. If you want the headers, just get them from file1
$ ( head -4 file ; paste file[1-4] | awk 'NR>4' ) > output
You dont need to use csv module for this. You can just use
file1 = open(file1)
After opening all your files you can do this
from itertools import izip_longest
foo=[]
for new_line in izip_longest(file1,fil2,file3....,fillvalue=''):
foo.append(new_line)
This will give you this structure (which kon has already told you)..It will also work if you have different number of lines in each file
[ ("line10", "line20", "line30", "line40"),
("line11", "line21", "line31", "line41"),
...
]
After this you can just write it to a new file taking 1 list at a time
for listx in foo:
new_file.write(','.join(j for j in listx))
PS: more about izip_longest here
You learn by doing (and trying, even). So, I'll just give you a few hints. Use the following functions:
- To open a file:
open()
- To read all the lines in a file:
IOBase.readlines()
- To split a string according to a series of splitting tokents:
str.split()
If you really don't know what to do, I recommend you read the tutorial and Dive Into Python 3. (Depending on how much Python you know, you'll either have to read through the first few chapters or cut straight to the file IO chapters.)
Purely for learning purposes
A simple approach that does not take advantage of csv module:
# open file to write
file_to_write = open(filename, 'w')
# your list of csv files
csv_files = [file1, file2, ...]
headers = True
# iterate through your list
for filex in csv_files:
# mark the lines that are header lines
header_count = 0
# open the csv file and read line by line
filex_f = open(filex, 'r')
for line in filex_f:
# write header only once
if headers:
file_to_write.write(line+"\n")
if header_count > 3: headers = False
# Write all other lines to the file
if header_count > 3:
file_to_write.write(line+"\n")
# count lines
header_count = header_count + 1
# close file
filex_f.close()
file_to_write.close()
精彩评论