开发者

Why doesn't ? work as an optional repetition specifier in this pattern?

I am trying to match inputs like

<foo>
<bar>
#####<foo>
#####<bar>

I tried开发者_如何转开发 #{5}?<\w+>, but it does not match <foo> and <bar>.

What's wrong with this pattern, and how can it be fixed?


On ? for optional vs reluctant

The ? metacharacter in Java regex (and some other flavors) can have two very different meanings, depending on where it appears. Immediately following a repetition specifier, ? is a reluctant quantifier instead of "zero-or-one"/"optional" repetition specifier.

Thus, #{5}? does not mean "optionally match 5 #". It in fact says "match 5 # reluctantly". It may not make too much sense to try to match "exactly 5, but as few as possible", but this is in fact what this pattern means.


Grouping to the rescue!

One way to fix this problem is to group the optional pattern as (…)?. Something like this should work for this problem:

(#{5})?<\w+>

Now the ? does not immediately follow a repetition specifier (i.e. *, +, ?, or {…}); it follows a closing bracket used for grouping.

Alternatively, you can also use a non-capturing group (?:…)in this case:

(?:#{5})?<\w+>

This achieves the same grouping effect, but doesn't capture into \1.

References

  • regular-expressions.info
    • Question Mark for Optional - yes, but only with proper placement
    • Brackets for Grouping
    • Repetition
    • Flavor comparison
  • java.util.regex.Pattern: X{n}? : X, exactly n times

Related questions

  • regex{n,}? == regex{n} ? (absolutely NOT!)
  • Difference between .*? and .* for regex

Bonus material: What about ??

It's worth noting that you can use ?? to match an optional item reluctantly!

    System.out.println("NOMZ".matches("NOMZ??"));
    // "true"

    System.out.println(
          "NOM NOMZ NOMZZ".replaceAll("NOMZ??", "YUM")
    ); // "YUM YUMZ YUMZZ"

Note that Z?? is an optional Z, but it's matched reluctantly. "NOMZ" in its entirety still matches the pattern NOMZ??, but in replaceAll, NOMZ?? can match only "NOM" and doesn't have to take the optional Z even if it's there.

By contrast, NOMZ? will match the optional Z greedily: if it's there, it'll take it.

    System.out.println(
          "NOM NOMZ NOMZZ".replaceAll("NOMZ?", "YUM")
    ); // "YUM YUM YUMZ"

Related questions

  • method matches not work well
    • unlike other flavors, Java matches a pattern against the entire String


Place your # match in a subpattern:

(#{5})?<\w+>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜