Quickly remove first n lines from many text files

2023-01-12 01:07 问答作者：

I need to create an output text file by deleting the first two lines of the input file.

At the moment I'm using sed "1,2d" input.txt > output.txt

I need to do this for thousands of files, so am using python:

import os
for filename in somelist:
  os.system('sed "1,2d" %s-in.txt > %s-out.txt'%(filename,filename))

but this is quite slow.

I need to keep the original file, so I can't sed in place.

Is there a way to do this faster? Using something other than sed?Pe开发者_开发技巧rhaps using some other scripting language than python? Is it worth writing a short C program, or is the file-writing disk-access likely to be the bottleneck?

Use tail. Doubt anything could be significantly faster:

tail -n +3 input.txt > output.txt

Wrap it in your loop of choice. But I really doubt sed is a whole ton slower - as you say, disk i/o is usually the ultimate bottleneck.

I think this will be faster than launching sed:

import os
import shutil

path = '/some/path/to/files/'
for filename in os.listdir(path):
    basename, ext = os.path.splitext(filename)
    fullname = os.path.join(path, filename)
    newname = os.path.join(path, basename + '-out' + ext)
    with open(fullname) as read:
        #skip first two lines
        for n in xrange(2):
            read.readline()
        # hand the rest to shutil.copyfileobj
        with open(newname, 'w') as write:
            shutil.copyfileobj(read, write)

for file in *.ext
do
    sed -i.bak -n '3,$p' $file 
done

or just

sed -i.bak -n '3,$p' *.ext

继续阅读：file-io performance python sed

Quickly remove first n lines from many text files

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？