开发者

Perl script to extract 2 lines before and after the pattern matching

my file is like

line 1 
line 2 
line 3
target
line 5
line 6
line 7

I can write a r开发者_C百科egex that matches the target. What all I need is I need to grab lines 2,3,5,6. Is there any way to do it?


If you're not determined to use perl you can easily extract the context you want with grep and Context Line Control options

grep -A 2 -B 2 target filename | grep -v target

Of course target will need to be replaced by a suitable regex.


Robert is on the right path. You have to multiline your regex and match the 2 previous and next lines:

#!/usr/bin/perl -w

my $lines = <<EOF
line 1
line 2
line 3
target
line 5
line 6
line 7
EOF
;

# Match a new line, then 2 lines, then target, then 2 lines.
#                { $1       }        { $3       }
my $re = qr/^.*\n((.*?\n){2})target\n((.*?\n){2}).*$/m;

(my $res = $lines) =~ s/$re/$1$3/;
print $res;


@lines = ('line 1', 'line 2', 'line 3', 'target', 'line 5', 'line 6', 'line 7');
my %answer;
$regex = 'target';
for my $idx (0..$#lines) {
    if ($lines[$idx] =~ /$regex/) {
        for $ii (($idx - 2)..($idx + 2)){
            unless ($lines[$ii] =~ /^$regex$/) {$answer{$ii} = $lines[$ii];}
        }
    }
}
foreach $key (sort keys %answer) { print "$answer{$key}\n" }

Which yields...

[mpenning@Bucksnort ~]$ perl search.pl
line 2
line 3
line 5
line 6
[mpenning@Bucksnort ~]$

EDIT

Fixed for @leonbloy's comment about multiple target strings in the file


slurp the file to a list / array, find the index of the matching line, and use this index to get the desired values (using offsets)


Although this was asked 8 months ago, I had to rethink this question, since none of the findable solution met with my aims. My goal was to make a script which examines many of huge log files, and makes extracts from them, containing only the wanted lines, putting optional number of lines before and after the line which contains the searched pattern(s) WITHOUT any redundancies. I tried to reuse some of the codes found here, but none of them was good enough for me. So finally I create a unique one, which is probably not the most beautiful, but looks useful, so I'd like to share it with you:

use strict;

my @findwhat      = ('x');
my $extraLines    = 3;
my @cache         = ('') x ($extraLines);
my @stack;
my $lncntr        = 0;
my $hit           = 0;
my $nextHitWatch  = 0;
my $shift         = 1;

open (IN, "<test1.log");
  while (my $line=<IN>) {
    $lncntr++;
    chomp $line;
    foreach my $what (@findwhat) {if ($line =~ m/$what/i) {$hit = 1; last}}

    if ($hit && !$nextHitWatch) {
      @stack = @cache;
      $hit = 0;
      $nextHitWatch++;
    }

    if (!$hit && $nextHitWatch && $nextHitWatch < ($extraLines * 2) + 2) {
      @stack = (@stack, $line);
      $nextHitWatch++;
    }

    if (!$hit && $nextHitWatch && $nextHitWatch == ($extraLines * 2) + 2) {
      @stack = (@stack, $line);
      for (my $i = 0; $i <= ($#stack - ($extraLines + $shift)); $i++) {
        print $stack[$i]. "\n" if $stack[$i];
      }
      $nextHitWatch = 0;
      $shift = 1;
      @stack = ();
    }

    if ($nextHitWatch >= 1 && eof) {
      foreach(@stack) {print "$_\n"}
    }

    if ($nextHitWatch >= 1 && eof) {
      if (!$hit) {
        my $upValue = 3 + $#stack - ($nextHitWatch - $extraLines + $shift);
        $upValue = ($upValue > $#stack) ? $#stack : $upValue;
        for (my $i = 0; $i <= $upValue; $i++) {
          print $stack[$i] . "\n";
        }
      } else {
        foreach (@stack) {print "$_\n"}
      }
    }

    shift(@cache);
    push(@cache, $line);
  }
close (IN);

Probably, you will have to change only the values of the list @findwhat and the scalar $extraLines. I hope my code will be useable. (Sorry for my poor English)


multiline the regex, eg: /\n{3}(foo)\n{3}/m;

edit /\n*(foo)\n*/m works in the general case


One liner version (where -l = chomp and -n = while(<>){}. See perldocperlrun for more options):

perl -lnE '$h{$.}=$_; END { 
  for ( grep { $h{$_} eq "target" } sort{ $a <=> $b } keys %h ) { 
  say for @h{$_-2..$_-1 , $_+1..$_+2} } }' data.txt

Script with explanation:

#!perl
use feature 'say';

while (<DATA>) {
  chomp;
  $hash{$.} = $_  ; # hash entry with line number as key; line contents as value
}

# find the target in the hash and sort keys or line numbers into an array
@matches = sort {$a <=> $b} grep { $hash{$_} eq 'target' } keys %hash;

for (@matches) { 
  say "before\n" ;
  say for @hash{$_-2..$_-1} ; # print the context lines as a hash slice
  say ">>>>\" $hash{$.} \"<<<< " ;
  say "after\n" ;
  say for @hash{$_+1..$_+2} ;
  say "";
}

__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target

Output:

before
line 2
line 3
>>>>" target "<<<< 
after
line 5
line 6

before
line 6
line 7
>>>>" target "<<<< 
after
line of context1
line of context2

before
line of context1
line of context2
>>>>" target "<<<< 
after

A simpler version using only arrays and with output that excludes the target as the OP question requested:

#!perl -l     
chomp( my @lines = <DATA> ) ; 
my $n = 2 ; # context range before/after

my @indexes = grep { $lines[$_] =~ m/target/ } 0..$#lines ; 
foreach my $i (@indexes) { 
  print for @lines[$i-$n..$i-1], @lines[$i+1..$i+$n],"";
}

__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target

This avoids constructing the hash but may be slower on very large files/arrays.

On CPAN List::MoreUtils has indexes() and there is always splice(), but I'm not sure these would make things simpler.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜