Perl script to extract 2 lines before and after the pattern matching

2023-03-04 07:38 问答作者：

my file is like

line 1 
line 2 
line 3
target
line 5
line 6
line 7

I can write a r开发者_C百科egex that matches the target. What all I need is I need to grab lines 2,3,5,6. Is there any way to do it?

If you're not determined to use perl you can easily extract the context you want with grep and Context Line Control options

grep -A 2 -B 2 target filename | grep -v target

Of course target will need to be replaced by a suitable regex.

Robert is on the right path. You have to multiline your regex and match the 2 previous and next lines:

#!/usr/bin/perl -w

my $lines = <<EOF
line 1
line 2
line 3
target
line 5
line 6
line 7
EOF
;

# Match a new line, then 2 lines, then target, then 2 lines.
#                { $1       }        { $3       }
my $re = qr/^.*\n((.*?\n){2})target\n((.*?\n){2}).*$/m;

(my $res = $lines) =~ s/$re/$1$3/;
print $res;

@lines = ('line 1', 'line 2', 'line 3', 'target', 'line 5', 'line 6', 'line 7');
my %answer;
$regex = 'target';
for my $idx (0..$#lines) {
    if ($lines[$idx] =~ /$regex/) {
        for $ii (($idx - 2)..($idx + 2)){
            unless ($lines[$ii] =~ /^$regex$/) {$answer{$ii} = $lines[$ii];}
        }
    }
}
foreach $key (sort keys %answer) { print "$answer{$key}\n" }

Which yields...

[mpenning@Bucksnort ~]$ perl search.pl
line 2
line 3
line 5
line 6
[mpenning@Bucksnort ~]$

EDIT

Fixed for @leonbloy's comment about multiple target strings in the file

slurp the file to a list / array, find the index of the matching line, and use this index to get the desired values (using offsets)

Although this was asked 8 months ago, I had to rethink this question, since none of the findable solution met with my aims. My goal was to make a script which examines many of huge log files, and makes extracts from them, containing only the wanted lines, putting optional number of lines before and after the line which contains the searched pattern(s) WITHOUT any redundancies. I tried to reuse some of the codes found here, but none of them was good enough for me. So finally I create a unique one, which is probably not the most beautiful, but looks useful, so I'd like to share it with you:

use strict;

my @findwhat      = ('x');
my $extraLines    = 3;
my @cache         = ('') x ($extraLines);
my @stack;
my $lncntr        = 0;
my $hit           = 0;
my $nextHitWatch  = 0;
my $shift         = 1;

open (IN, "<test1.log");
  while (my $line=<IN>) {
    $lncntr++;
    chomp $line;
    foreach my $what (@findwhat) {if ($line =~ m/$what/i) {$hit = 1; last}}

    if ($hit && !$nextHitWatch) {
      @stack = @cache;
      $hit = 0;
      $nextHitWatch++;
    }

    if (!$hit && $nextHitWatch && $nextHitWatch < ($extraLines * 2) + 2) {
      @stack = (@stack, $line);
      $nextHitWatch++;
    }

    if (!$hit && $nextHitWatch && $nextHitWatch == ($extraLines * 2) + 2) {
      @stack = (@stack, $line);
      for (my $i = 0; $i <= ($#stack - ($extraLines + $shift)); $i++) {
        print $stack[$i]. "\n" if $stack[$i];
      }
      $nextHitWatch = 0;
      $shift = 1;
      @stack = ();
    }

    if ($nextHitWatch >= 1 && eof) {
      foreach(@stack) {print "$_\n"}
    }

    if ($nextHitWatch >= 1 && eof) {
      if (!$hit) {
        my $upValue = 3 + $#stack - ($nextHitWatch - $extraLines + $shift);
        $upValue = ($upValue > $#stack) ? $#stack : $upValue;
        for (my $i = 0; $i <= $upValue; $i++) {
          print $stack[$i] . "\n";
        }
      } else {
        foreach (@stack) {print "$_\n"}
      }
    }

    shift(@cache);
    push(@cache, $line);
  }
close (IN);

Probably, you will have to change only the values of the list @findwhat and the scalar $extraLines. I hope my code will be useable. (Sorry for my poor English)

multiline the regex, eg: /\n{3}(foo)\n{3}/m;

edit /\n*(foo)\n*/m works in the general case

One liner version (where -l = chomp and -n = while(<>){}. See perldocperlrun for more options):

perl -lnE '$h{$.}=$_; END { 
  for ( grep { $h{$_} eq "target" } sort{ $a <=> $b } keys %h ) { 
  say for @h{$_-2..$_-1 , $_+1..$_+2} } }' data.txt

Script with explanation:

#!perl
use feature 'say';

while (<DATA>) {
  chomp;
  $hash{$.} = $_  ; # hash entry with line number as key; line contents as value
}

# find the target in the hash and sort keys or line numbers into an array
@matches = sort {$a <=> $b} grep { $hash{$_} eq 'target' } keys %hash;

for (@matches) { 
  say "before\n" ;
  say for @hash{$_-2..$_-1} ; # print the context lines as a hash slice
  say ">>>>\" $hash{$.} \"<<<< " ;
  say "after\n" ;
  say for @hash{$_+1..$_+2} ;
  say "";
}

__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target

Output:

before
line 2
line 3
>>>>" target "<<<< 
after
line 5
line 6

before
line 6
line 7
>>>>" target "<<<< 
after
line of context1
line of context2

before
line of context1
line of context2
>>>>" target "<<<< 
after

A simpler version using only arrays and with output that excludes the target as the OP question requested:

#!perl -l     
chomp( my @lines = <DATA> ) ; 
my $n = 2 ; # context range before/after

my @indexes = grep { $lines[$_] =~ m/target/ } 0..$#lines ; 
foreach my $i (@indexes) { 
  print for @lines[$i-$n..$i-1], @lines[$i+1..$i+$n],"";
}

__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target

This avoids constructing the hash but may be slower on very large files/arrays.

On CPAN List::MoreUtils has indexes() and there is always splice(), but I'm not sure these would make things simpler.

继续阅读：perl

Perl script to extract 2 lines before and after the pattern matching

EDIT

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

EDIT

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？