How do I match the line before and after a pattern match in Perl?

2023-01-15 07:51 问答作者：

I am matching a pattern and getting the line of the match using $.

I need to print the line matching before the particular pattern and after the particular pattern, e.g.:

line1
line2
line3
line4
line5

After my pattern matches line3, I want to print line2 and line4.

How can I do a pattern match in Perl? Can any one help me?

Thanks in advance

Senthil

You want what is normally called context. The easiest way to get context is to maintain it yourself with a variable:

#!/usr/bin/perl

use strict;
use warnings;

my $old;
while (my $line = <DATA>) {
    if ($line =~ /line3/) {
        print "$old$line", scalar <DATA>;
        last;
    }
    $old = $line;
}

__DATA__
line1
line2
line3
line4
line5

If you need more than one line of context, it is better to use an array:

#!/usr/bin/perl

use strict;
use warnings;

my $context = shift || 3;
if ($context < 0) {
    $context = 0;
}

my @old;
while (my $line = <DATA>) {
    if ($line =~ /line6/) {
        print @old, $line;
        for (1 .. $context) {
            print scalar <DATA>;
        }
        last;
    }
    push @old, $line;
    #remove a line if we have more than we need
    if (@old > $context) {
        shift @old;
    }
}

__DATA__
line1
line2
line3
line4
line5
line6
line7
line8
line9

With the entire file in a scalar, write your pattern so it captures the lines before and after line3. The /m modifier is especially useful:

Treat string as multiple lines. That is, change ^ and $ from matching the start or end of the string to matching the start or end of any line anywhere within the string.

The patterns below use the /x modifier that lets us add whitespace to make them look like what they're matching.

For example:

#! /usr/bin/perl

my $data = do { local $/; <DATA> };

my $pattern = qr/ ^(.+\n)
                  ^line3\n
                  ^(.+\n)
                /mx;

if ($data =~ /$pattern/) {
  print $1, $2;
}
else {
  print "no match\n";
}

__DATA__
line1
line2
line3
line4
line5

Output:

line2
line4

Remember that $ is an assertion: it doesn't consume any characters, so you have to match newline with a literal \n pattern.

Also note that the pattern above lacks generality. It works fine for a line somewhere in the middle, but it will fail if you change line3 to line1 or line5.

For the line1 case, you could make the previous line optional with a ? quantifier:

my $pattern = qr/ ^(.+\n)?
                  ^line1\n
                  ^(.+\n)
                /mx;

As expected, this produces output of

line2

But trying the same fix for the line5 case

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  ^(.+\n)?
                /mx;

gives

no match

This is because after the final newline in the file (the one following line5), ^ has nowhere to match, but changing the pattern to

my $pattern = qr/ ^(.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;

outputs

line4

We might stop here, but the asymmetry in the pattern is unpleasing. Why did work for one case and not for the other? With line1, ^ matches the beginning of $data and then matches nothing for (.+\n)?.

Remember: patterns quantified with ? or * always succeed because they're semantically the same as

zero times or one time
zero or more times

respectively, and anything can match zero times:

$ perl -le 'print scalar "abc" =~ /(?!)*/'
1

Although I can't think of a time I've ever seen it used this way, an {m,n} quantifier where m is zero, e.g.,

{0,100}
{0,}
{0}

will always succeed because m is the minimum number of repetitions. The {0} quantifier is a pathological case included for completeness.

All that was to show we more or less got lucky with the line1 case. ^ matched the very beginning, the ?-quantified pattern matched nothing, and then the next ^ also matched the very beginning of $data.

Restoring symmetry makes a cleaner pattern:

my $pattern = qr/ (^.+\n)?
                  ^line5\n
                  (^.+\n)?
                /mx;

I realize you asked for a Perl solution, but here is a Unix grep solution anyway:

grep -C 1 line3 file.txt

outputs:

line2
line3
line4

From the grep manpage:

   -C NUM, --context=NUM
    Print  NUM lines of output context.  Places a line containing --
    between contiguous groups of matches.

Using unix command line power is great is such cases and perl embraces it. try something like grep -A 1 or grep -B 1 it will give you the line after/before

继续阅读：perl regex

How do I match the line before and after a pattern match in Perl?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？