Linux, big text file, strip out content from line A to line B

2023-01-29 15:52 问答作者：

I want to strip a chunk of lines from a big text file. I know the start and end line number. What is the most elegant way to get the content (lines between the A and B) out to some file?

I know the head a开发者_StackOverflownd tail commands - is there even a quicker (one step) way?

The file is over 5GB and it contains over 81 mio lines.

UPDATED: The results

time sed -n 79224100,79898190p BIGFILE.log > out4.log
real    1m9.988s

time tail -n +79224100 BIGFILE.log | head -n +`expr 79898190 - 79224100` > out1.log
real    1m11.623s

time perl fileslice.pl BIGFILE.log 79224100 79898190 > out2.log
real    1m13.302s

time python fileslice.py 79224100 79898190 < BIGFILE.log > out3.log
real    1m13.277s

The winner is sed. The fastest, the shortest. I think Chuck Norris would use it.

sed -n '<A>,<B>p' input.txt

This works for me in GNU sed:

sed -n 'I,$p; Jq'

The q quits when the indicated line is processed.

for example, these large numbers work:

$ yes | sed -n '200000000,${=;p};200000005q'
200000000
y
200000001
y
200000002
y
200000003
y
200000004
y
200000005
y

I guess big files need a bigger solution...

fileslice.py:

import sys
import itertools

for line in itertools.islice(sys.stdin, int(sys.argv[1]) - 1, int(sys.argv[2])):
  sys.stdout.write(line)

invocation:

python fileslice.py 79224100 79898190 < input.txt > output.txt

Here's a perl solution :)

fileslice.pl:

#!/usr/bin/perl

use strict;
use warnings;
use IO::File;

my $first = $ARGV[1];
my $last = $ARGV[2];
my $fd = IO::File->new($ARGV[0], 'r') or die "Unable to open file $ARGV[0]: $!\n";
my $i = 0;
while (<$fd>) {
    $i++;
    next if ($i < $first);
    last if ($i > $last);
    print $_;
}

Start with

perl fileslice.pl file 79224100 79898190

继续阅读：shell

Linux, big text file, strip out content from line A to line B

fileslice.py:

invocation:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

fileslice.py:

invocation:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？