How can I match end-of-line multiple times in a regex without interpolation?

2022-12-30 20:15 问答作者：

if I have a input with new lines in it like:

[INFO]
xyz
[INFO]

How can I pull out the xyz part using $ anchors? I tried a pattern like /^\[INFO\]$(.*?)$\[INFO\]/ms, but perl gives me:

Use of uninitialized value $\ in regexp compilation at scripts\t.pl line 6.

Is there a way to shut off interpolation so the anchors work as expected?

EDIT: The key is that the end-of-line anchor is a dollar sign but at times it may开发者_运维技巧 be necessary to intersperse the end-of-line anchor through the pattern. If the pattern is interpolating then you might get problems such as uninitialized $\. For instance an acceptable solution here is /^\[INFO\]\s*^(.*?)\s*^\[INFO\]/ms but that does not solve the crux of the first problem. I've changed the anchors to be ^ so there is no interpolation going on, and with this input I'm free to do that. But what about when I really do want to reference EOL with $ in my pattern? How do I get the regex to compile?

The question is academic--there's no need for the $ anchors in your regex anyway. You should be using \n to match the newlines, because the $ only matches the gap between the linefeed and the character before it.

EDIT: What I'm trying to say is that you will never need to use $ that way. Any match that spans from one line to the next will have to consume the line separator somehow. Consider your example:

/^\[INFO\]$(.*?)$\[INFO\]/ms

If this did compile, the (.*?) would start out by consuming the first linefeed and keep going until it had matched \nxyz, where the second $ would succeed. But the next character is a linefeed, and the regex is looking for [, so that doesn't work. After backtracking, the (.*?) would reluctantly consume one more character--the second linefeed--but then the $ would fail.

Any time you try to match an EOL with $ and then some more stuff, the first "stuff" you'll have to match will be the linefeed, so why not match that instead? That's why the Perl regex compiler tries to interpret $\ as a variable name in your regex: it makes no sense to have an end-of-line anchor followed by a character that's not a line separator.

Based on the answer in perlfaq6 - How can I pull out lines between two patterns that are themselves on different lines? , here's what a one-liner would look like:

perl -0777 -ne 'print $1,"\n" while /\[INFO\]\s*(.*?)\s*\[INFO\]/sg' file.txt

The -0777 switch slurps in the whole file at once.

However, if you're after a subroutine that gives you the flexibility to choose what tag you want to extract, the File::Slurp module makes things a little easier:

use strict;
use warnings;
use File::Slurp qw/slurp/;

sub extract {

    my ( $tag, $fileName ) = @_;
    my $text = slurp $fileName;

    my ($info) = $text =~ /$tag\s*(.*?)\s*$tag/sg;
    return $info;
}

# Usage:
extract ( qr/\[INFO\]/, 'file.txt' );

When regexes get too tricky, they probably are the wrong tool. I might consider using the flip flop operator here. It's false until its lefthand side is true, then stays true until its righthand side is true. That way, you can choose where to start and end the extraction just by looking at individual lines:

my $string = <<'HERE';
[INFO]
xyz
[INFO]
HERE

open my $string_fh, '<', \$string;

while( <$string_fh> )
    {
    next if /\[INFO]/ .. /\[INFO]/;
    chomp;

    print "Extracted <$_>\n";
    }

If you are using Perl 5.10, you can use the generalized line ending \R in a regex:

use 5.010;

my $string = <<'HERE';
[INFO]
xyz
[INFO]
HERE

my( $extracted ) = $string =~ /(?:\A|\R)\[INFO]\R(.*?)\R\[INFO]\R/;

print "Extracted <$extracted>\n";

Don't get hung up on the end-of-line anchor.

Maybe the /x modifier can help:

m/ ^\[INFO\] $ # Match INFO line
   \n
   ^ (.*?) $ # Collect desired line
   \n 
   ^ \[INFO\] # Match another INFO line
/xms

I haven't tested that, so you'd probably have to debug it. But I think this will prevent the $ symbols from interpolating as variables.

Although I've accepted Alan Moore's answer (Ryan Thompson's answer would also have done the trick too bad I could only accept one) I wanted to make perfectly clear the solution, as it was kind of buried in the comments and discussion. The following Perl script demonstrates that Perl is using the $ to interpolate variables if any character proceeds the dollar sign, and that turning off interpolation will allow the $ to be treated as EOL.

use strict;
use warnings;

my $x = "[INFO]\nxyz\n[INFO]";
if( $x =~ /^\[INFO\]$\n(.*?)$\n\[INFO\]/m ) {
    print "'$1' FOUND\n";
} else {
    print "NO MATCH FOUND\n";
}

if( $x =~ m'^\[INFO\]$\n(.*?)$\n\[INFO\]'m ) {
    print "'$1' FOUND\n";
} else {
    print "NO MATCH FOUND\n";
}

if( $x =~ m/ ^\[INFO\] $ # Match INFO line
\n
^ (.*?) $ # Collect desired line
\n 
^ \[INFO\] # Match another INFO line
/xms ) {
    print "'$1' FOUND\n";
} else {
    print "NO MATCH FOUND\n";
}

The script produces the following output:

Use of uninitialized value $\ in regexp compilation at t.pl line 5.
Use of uninitialized value $\ in regexp compilation at t.pl line 5.
NO MATCH FOUND
'xyz' FOUND
'xyz' FOUND

继续阅读：interpolation perl regex

How can I match end-of-line multiple times in a regex without interpolation?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？