How do I make an arbitrary Perl regex wholly non-capturing?
How can I remove capturing from arbitrarily nested sub-groups in a a Perl regex string? I'd like to nest any regex into an enveloping expression that captures the sub-regex as a whole entity as well as statically known subsequent groups. Do I need to transform the regex string manually into using all non-capturing (?:)
groups (and hope I don't mess up), or is there a Perl regex or library mechanism that provides this?
# How do I 'flatten' $regex to protect $2 and $3?
# Searching 'ABCfooDE' for 'foo' OK, but '((B|(C))fo(o)?(?:D|d)?)', etc., breaks.
# I.E., how would I turn it effectively into '(?:(?:B|(?:C))fo(?:o)?(?:D|d)?)'?
sub check {
my($line, $regex) = @_;
if ($line =~ /(^.*)($regex)(.*$)/) {
print "<", $1, "><", $2, "><", $3, ">\n";开发者_JAVA技巧
}
}
Addendum: I am vaguely aware of $&
, $`
, and $'
and have been advised to avoid them if possible, and I don't have access to ${^PREMATCH}
, ${^MATCH}
and ${^POSTMATCH}
in my Perl 5.8 environment. The example above can be partitioned into 2/3 chunks using methods like these, and more complex real cases could manually iterate this, but I think I'd like a general solution if possible.
Accepted Answer: What I wish existed and surprisingly (to me at least) does not, is an encapsulating group that makes its contents opaque, such that subsequent positional backreferences see the contents as a single entity and names references are de-scoped. gbacon has a potentially useful workaround for Perl 5.10+, and FM shows a manual iterative mechanism for any version that can accomplish the same effect in specific cases, but j_random_hacker calls it that there is no real language mechanism to encapsulate subexpressions.
In general, you can't.
Even if you could transform all (...)
s into (?:...)
s, this would not work in the general case because the pattern might require backreferences: e.g. /(.)X\1/
, which matches any character, followed by an X
, followed by the originally matched character.
So, absent a Perl mechanism for discarding captured results "after the fact", there is no way to solve your problem for all regexes. The best you can do (or could do if you had Perl 5.10) is to use gbacon's suggestion and hope to generate a unique name for the capture buffer.
One way to protect the subpatterns you care about is to use named capture buffers:
Additionally, as of Perl 5.10.0 you may use named capture buffers and named backreferences. The notation is
(?<name>...)
to declare and\k<name>
to reference. You may also use apostrophes instead of angle brackets to delimit the name; and you may use the bracketed\g{name}
backreference syntax. It's possible to refer to a named capture buffer by absolute and relative number as well. Outside the pattern, a named capture buffer is available via the%+
hash. When different buffers within the same pattern have the same name,$+{name}
and\k<name>
refer to the leftmost defined group.
In the context of your question, check
becomes
sub check {
use 5.10.0;
my($line, $regex) = @_;
if ($line =~ /(^.*)($regex)(.*$)/) {
print "<", $+{one}, "><", $+{two}, "><", $+{three}, ">\n";
}
}
Then calling it with
my $pat = qr/(?<one>(?<two>B|(?<three>C))fo(o)?(?:D|d)?)/;
check "ABCfooDE", $pat;
outputs
<CfooD><C><C>
This does not address the general case, but your specific example can be handled with the /g
option in scalar context, which would allow you to divide the problem into two matches, the second picking up where the first left off:
sub check {
my($line, $regex) = @_;
my ($left_side, $regex_match) = ($1, $2) if $line =~ /(^.*)($regex)/g;
my $right_side = $1 if $line =~ /(.*$)/g;
print "<$left_side> <$regex_match> <$right_side>\n"; # <AB> <CfooD> <E123>
}
check( 'ABCfooDE123', qr/((B|(C))fo(o)?(?:D|d)?)/ );
If all you need is the portion of the string before and after the match, you can use the @- and @+ arrays to get the offsets into the matched string:
sub check {
my ($line, $regex) = @_;
if ($line =~ /$regex/) {
my $pre = substr $line, 0, $-[0];
my $match = substr $line, $-[0], $+[0] - $-[0];
my $post = substr $line, $+[0];
print "<$pre><$match><$post>\n";
}
}
Perl v5.22 and later has a /n
modifier which turn all capturing off.
This doesn't disable capturing, but might accomplish what you want:
$ perl -wle 'my $_ = "123abc"; /(\d+)/ && print "num: $1"; { /([a-z]+)/ && print "letter: $1"; } print "num: $1";'
num: 123
letter: abc
num: 123
You create a new scope and the $1 outside it will not be affected.
精彩评论