Perl script to extract 2 lines before and after the pattern matching
my file is like
line 1
line 2
line 3
target
line 5
line 6
line 7
I can write a r开发者_C百科egex that matches the target. What all I need is I need to grab lines 2,3,5,6. Is there any way to do it?
If you're not determined to use perl
you can easily extract the context you want with grep
and Context Line Control options
grep -A 2 -B 2 target filename | grep -v target
Of course target
will need to be replaced by a suitable regex.
Robert is on the right path. You have to multiline your regex and match the 2 previous and next lines:
#!/usr/bin/perl -w
my $lines = <<EOF
line 1
line 2
line 3
target
line 5
line 6
line 7
EOF
;
# Match a new line, then 2 lines, then target, then 2 lines.
# { $1 } { $3 }
my $re = qr/^.*\n((.*?\n){2})target\n((.*?\n){2}).*$/m;
(my $res = $lines) =~ s/$re/$1$3/;
print $res;
@lines = ('line 1', 'line 2', 'line 3', 'target', 'line 5', 'line 6', 'line 7');
my %answer;
$regex = 'target';
for my $idx (0..$#lines) {
if ($lines[$idx] =~ /$regex/) {
for $ii (($idx - 2)..($idx + 2)){
unless ($lines[$ii] =~ /^$regex$/) {$answer{$ii} = $lines[$ii];}
}
}
}
foreach $key (sort keys %answer) { print "$answer{$key}\n" }
Which yields...
[mpenning@Bucksnort ~]$ perl search.pl
line 2
line 3
line 5
line 6
[mpenning@Bucksnort ~]$
EDIT
Fixed for @leonbloy's comment about multiple target strings in the file
slurp the file to a list / array, find the index of the matching line, and use this index to get the desired values (using offsets)
Although this was asked 8 months ago, I had to rethink this question, since none of the findable solution met with my aims. My goal was to make a script which examines many of huge log files, and makes extracts from them, containing only the wanted lines, putting optional number of lines before and after the line which contains the searched pattern(s) WITHOUT any redundancies. I tried to reuse some of the codes found here, but none of them was good enough for me. So finally I create a unique one, which is probably not the most beautiful, but looks useful, so I'd like to share it with you:
use strict;
my @findwhat = ('x');
my $extraLines = 3;
my @cache = ('') x ($extraLines);
my @stack;
my $lncntr = 0;
my $hit = 0;
my $nextHitWatch = 0;
my $shift = 1;
open (IN, "<test1.log");
while (my $line=<IN>) {
$lncntr++;
chomp $line;
foreach my $what (@findwhat) {if ($line =~ m/$what/i) {$hit = 1; last}}
if ($hit && !$nextHitWatch) {
@stack = @cache;
$hit = 0;
$nextHitWatch++;
}
if (!$hit && $nextHitWatch && $nextHitWatch < ($extraLines * 2) + 2) {
@stack = (@stack, $line);
$nextHitWatch++;
}
if (!$hit && $nextHitWatch && $nextHitWatch == ($extraLines * 2) + 2) {
@stack = (@stack, $line);
for (my $i = 0; $i <= ($#stack - ($extraLines + $shift)); $i++) {
print $stack[$i]. "\n" if $stack[$i];
}
$nextHitWatch = 0;
$shift = 1;
@stack = ();
}
if ($nextHitWatch >= 1 && eof) {
foreach(@stack) {print "$_\n"}
}
if ($nextHitWatch >= 1 && eof) {
if (!$hit) {
my $upValue = 3 + $#stack - ($nextHitWatch - $extraLines + $shift);
$upValue = ($upValue > $#stack) ? $#stack : $upValue;
for (my $i = 0; $i <= $upValue; $i++) {
print $stack[$i] . "\n";
}
} else {
foreach (@stack) {print "$_\n"}
}
}
shift(@cache);
push(@cache, $line);
}
close (IN);
Probably, you will have to change only the values of the list @findwhat and the scalar $extraLines. I hope my code will be useable. (Sorry for my poor English)
multiline the regex, eg: /\n{3}(foo)\n{3}/m;
edit
/\n*(foo)\n*/m
works in the general case
One liner version (where -l
= chomp
and -n
= while(<>){}
. See perldoc
perlrun
for more options):
perl -lnE '$h{$.}=$_; END {
for ( grep { $h{$_} eq "target" } sort{ $a <=> $b } keys %h ) {
say for @h{$_-2..$_-1 , $_+1..$_+2} } }' data.txt
Script with explanation:
#!perl
use feature 'say';
while (<DATA>) {
chomp;
$hash{$.} = $_ ; # hash entry with line number as key; line contents as value
}
# find the target in the hash and sort keys or line numbers into an array
@matches = sort {$a <=> $b} grep { $hash{$_} eq 'target' } keys %hash;
for (@matches) {
say "before\n" ;
say for @hash{$_-2..$_-1} ; # print the context lines as a hash slice
say ">>>>\" $hash{$.} \"<<<< " ;
say "after\n" ;
say for @hash{$_+1..$_+2} ;
say "";
}
__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target
Output:
before
line 2
line 3
>>>>" target "<<<<
after
line 5
line 6
before
line 6
line 7
>>>>" target "<<<<
after
line of context1
line of context2
before
line of context1
line of context2
>>>>" target "<<<<
after
A simpler version using only arrays and with output that excludes the target as the OP question requested:
#!perl -l
chomp( my @lines = <DATA> ) ;
my $n = 2 ; # context range before/after
my @indexes = grep { $lines[$_] =~ m/target/ } 0..$#lines ;
foreach my $i (@indexes) {
print for @lines[$i-$n..$i-1], @lines[$i+1..$i+$n],"";
}
__DATA__
line 1
line 2
line 3
target
line 5
line 6
line 7
target
line of context1
line of context2
target
This avoids constructing the hash but may be slower on very large files/arrays.
On CPAN List::MoreUtils
has indexes()
and there is always splice()
, but I'm not sure these would make things simpler.
精彩评论