开发者

Confusion in RegExp Reluctant quantifier? Java

Why do I get the output ab for the following regular-expression code with a Relucutant quantifier?

    Pattern p = Pattern.compile("abc*?");
    Matcher m = p.matcher("abcfoo");
    while(m.find())
      System.out.println(m.group()); // ab

Similarly, why do I 开发者_如何学Cget empty indices for the following code?

   Pattern p = Pattern.compile(".*?");
   Matcher m = p.matcher("abcfoo");
   while(m.find())
     System.out.println(m.group());


In addition to Konrad Rudolph's answer:

abc*?

matches "ab" in any case and "c" only if it must. Since nothing follows the *?, the regex engine stops immediately. If you had:

abc*?f

then it would match "abcf" be cause the "c" must match in order to allow the "f" to match, too. The other expression:

.*?

matches nothing because this pattern is 100% optional.

.*?f

would match "abcf" again.


*? matches zero or more matches, but as few as possible (and by the way, that’s usually called “non-greedy”, not “reluctant”). So if zero matches is possible, that’s the optimal match.

What exactly do you want to achieve? Maybe non-greedy matching isn’t what you need.


It never makes sense to have a reluctant quantifier as the last thing in a regex. A reluctant quantifier matches only as much as it has to in order to achieve an overall match. That means there has to be something after the quantifier to force it to keep matching.

If it seems odd to have something that can be put such a pointless use, it's probably because reluctant quantifiers are an add-on--something that's not possible with "real" regular expressions. Some other examples of pointless usage are the "quantifier" {1}, and \b+ or any other zero-width assertion (^, $, lookarounds, etc.) with a quantifier. Some flavors treat the latter as a syntax error; Java allows it, but of course only applies the assertion once.


The ? reluctant quantifier makes .* match as few characters as possible, only matching more character if it's required by backtracking.

Here's an illustrative example of using regex to find a non-empty prefix that is also a suffix of a string (no overlapping).

The capturing group \1 in the first pattern is greedy: it first matches everything, and takes as less as it backtracks. As such, the pattern will find the longest possible prefix/suffix match:

    System.out.println(
        "abracadabra".replaceAll("^(.+).*\\1$", "($1)")
    ); // prints "(abra)"

Now \1 in the second pattern is reluctant; it first matches nothing, and takes more as it backtracks. As such, the pattern will find the shortest prefix/suffix match:

    System.out.println(
        "abracadabra".replaceAll("^(.+?).*\\1$", "($1)")
    ); // prints "(a)"

In your case, the .*? can match an empty string, and never needed to backtrack and match more since it was enough for the overall pattern to match.

See also

  • regular-expressions.info/Modifiers

Here's another illustrative example of reluctant quantifier on a finite repetition:

Here, x{3,5} is greedy, and will take as much as possible.

    System.out.println(
        "xxxxxxx".replaceAll("x{3,5}", "Y")
    ); // prints "Yxx"

Here, x{3,5}? is reluctant, and will take as few as possible.

    System.out.println(
        "xxxxxxx".replaceAll("x{3,5}?", "Y")
    ); // prints "YYx"


     *?-> also call it as Lasy star
        ^abc*?f
    *?----> repeats  0 or more times
    ^---> regular expression for start of the string
      Example:  abcf00abcf00 --->Matches:"abcf"00abcf00
In this case c must select to reach f

        abc*? 
    *?----> repeats 0 or more times 
    Matches ab
      Example:  abcabcabcabc -----> Matches:"ab"c"ab"c"ab"c"ab"c

        abc.* matches any character except line break
       Example: abcabababbababab --->Matches:"abcabababbababab"

        ab.*?
      example:  ababababbababab ---> "ab""ab""ab""ab""ab""ab""ab""ab"
    abc? matsches ab or abc
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜