开发者

Have a regular expression keep matching as much as possible?

Is there a convenient way to write a regex that will try to match as much of the regex as possible?

Example:

my $re = qr/a ([a-z]+) (\d+)/;

match_longest($re, "a") => ()
match_longest($re, "a word") => ("word")
match_longest($re, "a word 123") => ("word", "123")
match_longest($re, "a 123") => ()

That is, $re is considered to be a sequence of regular expressions, and match_longest attempts to match as much of this sequence. In a sense, matching never fails - it's only a question of how much matching succeeded. Once a regex match fails, undef for the parts that didn't match.

I know I could write a function which takes a sequence of regexes a开发者_StackOverflow中文版nd creates a single regex to do the job of match_longest. Here's an outline of the idea:

Suppose you have three regexes: $r1, $r2 and $r3. The single regex to perform the job of match_longest would have the following structure:

$r = ($r1 $r2 $r3)? | $r1 ($r2 $r3) | $r1 $r2 $r3?

Unfortunately, this is quadratic in the number of regexes. Is it possible to be more efficient?


You can use the regex

$r = ($r1 ($r2 ($r3)?)?)?

which has each regex contained only once. You may also use non-capturing groups (?:...) in this example to not interfere with your original regular expressions.


If I understand the question, using nested groups with ? should work:

my $re = qr/a ((\w+) (\d+)?)?/;


This particular case can be written like this:

m/a (?:(\w+)(?: (\d+))?)?/
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜