Why doesn't ? work as an optional repetition specifier in this pattern?
I am trying to match inputs like
<foo>
<bar>
#####<foo>
#####<bar>
I tried开发者_如何转开发 #{5}?<\w+>
, but it does not match <foo>
and <bar>
.
What's wrong with this pattern, and how can it be fixed?
On ?
for optional vs reluctant
The ?
metacharacter in Java regex (and some other flavors) can have two very different meanings, depending on where it appears. Immediately following a repetition specifier, ?
is a reluctant quantifier instead of "zero-or-one"/"optional" repetition specifier.
Thus, #{5}?
does not mean "optionally match 5 #
". It in fact says "match 5 #
reluctantly". It may not make too much sense to try to match "exactly 5, but as few as possible", but this is in fact what this pattern means.
Grouping to the rescue!
One way to fix this problem is to group the optional pattern as (…)?
. Something like this should work for this problem:
(#{5})?<\w+>
Now the ?
does not immediately follow a repetition specifier (i.e. *
, +
, ?
, or {…}
); it follows a closing bracket used for grouping.
Alternatively, you can also use a non-capturing group (?:…)
in this case:
(?:#{5})?<\w+>
This achieves the same grouping effect, but doesn't capture into \1
.
References
- regular-expressions.info
- Question Mark for Optional - yes, but only with proper placement
- Brackets for Grouping
- Repetition
- Flavor comparison
java.util.regex.Pattern
:X{n}?
: X, exactly n times
Related questions
regex{n,}?
==regex{n}
? (absolutely NOT!)- Difference between
.*?
and.*
for regex
Bonus material: What about ??
It's worth noting that you can use ??
to match an optional item reluctantly!
System.out.println("NOMZ".matches("NOMZ??"));
// "true"
System.out.println(
"NOM NOMZ NOMZZ".replaceAll("NOMZ??", "YUM")
); // "YUM YUMZ YUMZZ"
Note that Z??
is an optional Z
, but it's matched reluctantly. "NOMZ"
in its entirety still matches
the pattern NOMZ??
, but in replaceAll
, NOMZ??
can match only "NOM"
and doesn't have to take the optional Z
even if it's there.
By contrast, NOMZ?
will match the optional Z
greedily: if it's there, it'll take it.
System.out.println(
"NOM NOMZ NOMZZ".replaceAll("NOMZ?", "YUM")
); // "YUM YUM YUMZ"
Related questions
- method matches not work well
- unlike other flavors, Java
matches
a pattern against the entireString
- unlike other flavors, Java
Place your #
match in a subpattern:
(#{5})?<\w+>
精彩评论