How do I match the line before and after a pattern match in Perl?
I am matching a pattern and getting the line of the match using $.
I need to print the line matching before the particular pattern and after the particular pattern, e.g.:
line1
line2
line3
line4
line5
After my pattern matches line3
, I want to print line2
and line4
.
How can I do a pattern match in Perl? Can any one help me?
Thanks in advance
Senthil
You want what is normally called context. The easiest way to get context is to maintain it yourself with a variable:
#!/usr/bin/perl
use strict;
use warnings;
my $old;
while (my $line = <DATA>) {
if ($line =~ /line3/) {
print "$old$line", scalar <DATA>;
last;
}
$old = $line;
}
__DATA__
line1
line2
line3
line4
line5
If you need more than one line of context, it is better to use an array:
#!/usr/bin/perl
use strict;
use warnings;
my $context = shift || 3;
if ($context < 0) {
$context = 0;
}
my @old;
while (my $line = <DATA>) {
if ($line =~ /line6/) {
print @old, $line;
for (1 .. $context) {
print scalar <DATA>;
}
last;
}
push @old, $line;
#remove a line if we have more than we need
if (@old > $context) {
shift @old;
}
}
__DATA__
line1
line2
line3
line4
line5
line6
line7
line8
line9
With the entire file in a scalar, write your pattern so it captures the lines before and after line3
. The /m
modifier is especially useful:
Treat string as multiple lines. That is, change
^
and$
from matching the start or end of the string to matching the start or end of any line anywhere within the string.
The patterns below use the /x
modifier that lets us add whitespace to make them look like what they're matching.
For example:
#! /usr/bin/perl
my $data = do { local $/; <DATA> };
my $pattern = qr/ ^(.+\n)
^line3\n
^(.+\n)
/mx;
if ($data =~ /$pattern/) {
print $1, $2;
}
else {
print "no match\n";
}
__DATA__
line1
line2
line3
line4
line5
Output:
line2 line4
Remember that $
is an assertion: it doesn't consume any characters, so you have to match newline with a literal \n
pattern.
Also note that the pattern above lacks generality. It works fine for a line somewhere in the middle, but it will fail if you change line3
to line1
or line5
.
For the line1
case, you could make the previous line optional with a ?
quantifier:
my $pattern = qr/ ^(.+\n)?
^line1\n
^(.+\n)
/mx;
As expected, this produces output of
line2
But trying the same fix for the line5
case
my $pattern = qr/ ^(.+\n)?
^line5\n
^(.+\n)?
/mx;
gives
no match
This is because after the final newline in the file (the one following line5
), ^
has nowhere to match, but changing the pattern to
my $pattern = qr/ ^(.+\n)?
^line5\n
(^.+\n)?
/mx;
outputs
line4
We might stop here, but the asymmetry in the pattern is unpleasing. Why did work for one case and not for the other? With line1
, ^
matches the beginning of $data
and then matches nothing for (.+\n)?
.
Remember: patterns quantified with ?
or *
always succeed because they're semantically the same as
- zero times or one time
- zero or more times
respectively, and anything can match zero times:
$ perl -le 'print scalar "abc" =~ /(?!)*/' 1
Although I can't think of a time I've ever seen it used this way, an {m,n}
quantifier where m is zero, e.g.,
- {0,100}
- {0,}
- {0}
will always succeed because m is the minimum number of repetitions. The {0}
quantifier is a pathological case included for completeness.
All that was to show we more or less got lucky with the line1
case. ^
matched the very beginning, the ?
-quantified pattern matched nothing, and then the next ^
also matched the very beginning of $data
.
Restoring symmetry makes a cleaner pattern:
my $pattern = qr/ (^.+\n)?
^line5\n
(^.+\n)?
/mx;
I realize you asked for a Perl solution, but here is a Unix grep
solution anyway:
grep -C 1 line3 file.txt
outputs:
line2
line3
line4
From the grep
manpage:
-C NUM, --context=NUM Print NUM lines of output context. Places a line containing -- between contiguous groups of matches.
Using unix command line power is great is such cases and perl embraces it.
try something like grep -A 1
or grep -B 1
it will give you the line after/before
精彩评论