Quickly remove first n lines from many text files
I need to create an output text file by deleting the first two lines of the input file.
At the moment I'm using sed "1,2d" input.txt > output.txt
I need to do this for thousands of files, so am using python:
import os
for filename in somelist:
os.system('sed "1,2d" %s-in.txt > %s-out.txt'%(filename,filename))
but this is quite slow.
I need to keep the original file, so I can't sed in place.
Is there a way to do this faster? Using something other than sed?Pe开发者_开发技巧rhaps using some other scripting language than python? Is it worth writing a short C program, or is the file-writing disk-access likely to be the bottleneck?
Use tail
. Doubt anything could be significantly faster:
tail -n +3 input.txt > output.txt
Wrap it in your loop of choice. But I really doubt sed is a whole ton slower - as you say, disk i/o is usually the ultimate bottleneck.
I think this will be faster than launching sed:
import os
import shutil
path = '/some/path/to/files/'
for filename in os.listdir(path):
basename, ext = os.path.splitext(filename)
fullname = os.path.join(path, filename)
newname = os.path.join(path, basename + '-out' + ext)
with open(fullname) as read:
#skip first two lines
for n in xrange(2):
read.readline()
# hand the rest to shutil.copyfileobj
with open(newname, 'w') as write:
shutil.copyfileobj(read, write)
for file in *.ext
do
sed -i.bak -n '3,$p' $file
done
or just
sed -i.bak -n '3,$p' *.ext
精彩评论