开发者

How do I match the line before and after a pattern match in Perl?

I am matching a pattern and getting the line of the match using $.

开发者_开发问答

I need to print the line matching before the particular pattern and after the particular pattern, e.g.:

line1
line2
line3
line4
line5

After my pattern matches line3, I want to print line2 and line4.

How can I do a pattern match in Perl? Can any one help me?

Thanks in advance

Senthil


You want what is normally called context. The easiest way to get context is to maintain it yourself with a variable:

#!/usr/bin/perl

use strict;
use warnings;

my $old;
while (my $line = <DATA>) {
    if ($line =~ /line3/) {
        print "$old$line", scalar <DATA>;
        last;
    }
    $old = $line;
}

__DATA__
line1
line2
line3
line4
line5

If you need more than one line of context, it is better to use an array:

#!/usr/bin/perl

use strict;
use warnings;

my $context = shift || 3;
if ($context < 0) {
    $context = 0;
}

my @old;
while (my $line = <DATA>) {
    if ($line =~ /line6/) {
        print @old, $line;
        for (1 .. $context) {
            print scalar <DATA>;
        }
        last;
    }
    push @old, $line;
    #remove a line if we have more than we need
    if (@old > $context) {
        shift @old;
    }
}

__DATA__
line1
line2
line3
line4
line5
line6
line7
line8
line9


With the entire file in a scalar, write your pattern so it captures the lines before and after line3. The /m modifier is especially useful:

Treat string as multiple lines. That is, change ^ and $ from matching the start or end of the string to matching the start or end of any line anywhere within the string.

The patterns below use the /x modifier that lets us add whitespace to make them look like what they're matching.

For example:

#! /usr/bin/perl

my $data = do { local $/; <DATA> };

my $pattern = qr/ ^(.+\n)
                  ^line3\n
                  ^(.+\n)
                /mx;

if ($data =~ /$pattern/) {
  print $1, $2;
}
else {
  print "no match\n";
}

__DATA__
line1
line2
line3
line4
line5

Output:

line2
line4

Remember that $ is an assertion: it doesn't consume any characters, so you have to match newline with a literal \n pattern.

Also note that the pattern above lacks generality. It works fine for a line somewhere in the middle, but it will fail if you change line3 to line1 or line5.

For the line1 case, you could make the previous line optional with a ? quantifier:

my $pattern = qr/ ^(.+\n)?
                  ^line1\n
                  ^(.+\n)
                /mx;

As expected, this produces output of

line2

But trying the same fix for the line5 case

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  ^(.+\n)?
                /mx;

gives

no match

This is because after the final newline in the file (the one following line5), ^ has nowhere to match, but changing the pattern to

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;

outputs

line4

We might stop here, but the asymmetry in the pattern is unpleasing. Why did work for one case and not for the other? With line1, ^ matches the beginning of $data and then matches nothing for (.+\n)?.

Remember: patterns quantified with ? or * always succeed because they're semantically the same as

  • zero times or one time
  • zero or more times

respectively, and anything can match zero times:

$ perl -le 'print scalar "abc" =~ /(?!)*/'
1

Although I can't think of a time I've ever seen it used this way, an {m,n} quantifier where m is zero, e.g.,

  • {0,100}
  • {0,}
  • {0}

will always succeed because m is the minimum number of repetitions. The {0} quantifier is a pathological case included for completeness.

All that was to show we more or less got lucky with the line1 case. ^ matched the very beginning, the ?-quantified pattern matched nothing, and then the next ^ also matched the very beginning of $data.

Restoring symmetry makes a cleaner pattern:

my $pattern = qr/ (^.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;


I realize you asked for a Perl solution, but here is a Unix grep solution anyway:

grep -C 1 line3 file.txt

outputs:

line2
line3
line4

From the grep manpage:

   -C NUM, --context=NUM
    Print  NUM lines of output context.  Places a line containing --
    between contiguous groups of matches.


Using unix command line power is great is such cases and perl embraces it. try something like grep -A 1 or grep -B 1 it will give you the line after/before

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜