Python File Concatenation
I have a data folder, with subfolders for each subject that ran through a program. So, for example, in the data folder, there are folders for Bob, Fred, and Tom. Each one of those folders contains a variety of files and subfolders. However, I am only interested in the 'summary.log' file contained in each subject's folder.
I want to concatenate the 'summary.log' file from Bob, Fred, and Tom into a single log file in the data folder. In addition, I want to add a column to each log file that will list the subject number.
Is this possible to do in Python? Or is there an easier way to do it? I have tried a number of different batches of code, but none of them get the job done. For example,
#!/usr/bin/python
import sys, string, glob, os
fls = glob.glob(r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*');
outfile = open('summary.log','w');
for x in fls:
file=open(x,'r');
data=file.read();
file.close();
outfile.write(data);
outfile.close();
Gives me the error,
Traceback (most recent call last):
File "fileconcat.py", line 8, in <module>
file=open(x,'r');
IOError: [Errno 21] Is a directory
I think this has to do with the fact that the data folder contains subfolders, but I don't know how to work around it. I also tried this, but to no avail:
from glob import iglob
import shutil
import os
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*'
destination = open('summary.log', 'wb')
for filename in iglob(os.path.join(PATH, '*.log'))
shutil.copyfileobj(open(filename, 'rb'), destinati开发者_Python百科on)
destination.close()
This gives me an "invalid syntax" error at the "for filename" line, but I'm not sure what to change.
The syntax is not related to the use of glob. You forget the ":" at the end of the for statement:
for filename in iglob(os.path.join(PATH, '*.log')):
^--- missing
But the following pattern works :
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*/*.log'
destination = open('summary.log', 'wb')
for filename in iglob(PATH):
shutil.copyfileobj(open(filename, 'rb'), destination)
destination.close()
The colon (:
) is missing in the for
line.
Besides you should use with
because it handles closing the file (your code is not exception safe).
from glob import iglob
import shutil
import os
PATH = r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*'
with open('summary.log', 'wb') as destination:
for filename in iglob(os.path.join(PATH, '*.log')):
with open(filename, 'rb') as in_:
shutil.copyfileobj(in_, destination)
In your first example:
import sys, string, glob, os
you are not using sys
, string
or os
, so there is no need to import those.
fls = glob.glob(r'/Users/slevclab/Desktop/Acceptability Judgement Task/data/*');
here, you are selecting the subject folders. Since you are interested in summary.log
files within these folders, you may change the pattern as follows:
fls = glob.glob('/Users/slevclab/Desktop/Acceptability Judgement Task/data/*/summary.log')
In Python, there is no need to end lines with semicolons.
outfile = open('summary.log','w')
for x in fls:
file = open(x, 'r')
data = file.read()
file.close()
outfile.write(data)
outfile.close()
As VGE's answer shows, your second solution works once you've fixed the syntax error. But note that a more general solution is to use os.walk
:
>>> import os
>>> for i in os.walk('foo'):
... print i
...
('foo', ['bar', 'baz'], ['oof.txt'])
('foo/bar', [], ['rab.txt'])
('foo/baz', [], ['zab.txt'])
This goes through all the directories in the tree above the start directory and maintains a nice separation between directories and files.
精彩评论