开发者

What does this regex pattern describe: ".{5,}+"

One of the HTML开发者_C百科 input fields in an app I'm working on is being validated with the following regex pattern:

.{5,}+

What is this checking for?

Other fields are being checked with this pattern which I also don't understand:

.+


We can break your pattern down into three parts:

The dot is a wildcard, it matches any character (except for newlines, by default, unless the /s modifier is set).

{5,} is specifies repetition on the dot. It says that the dot must match at least 5 times. If there was a number after the comma, the dot would have to match between 5 and that number of times, but since there's no number, it can match infinite times.

In your first pattern, the + is a possessive quantifier (see below for how + can mean different things in different situations). It tells the regular expression engine that once it's satisfied the previous condition (ie. .{5,}), it should not try to backtrack.


Your second pattern is simpler. The dot still means the same thing as above (works as a wildcard). However, here the + has a different meaning, and is a repetition operator, meaning that the dot must match 1 or more times (that could also be expressed as .{1,}, as we saw above).

As you can see, + has a different meaning depending on context. When used on its own, it is a repetition operator. However when it follows a different repetition operator (either *, ?, + or {...}) it becomes a possessive quantifier.


The + means after another quantifier ({5,}) means a possessive match, i.e. once a match is found, *do not backtrack**.

For instance, the pattern .{5,}x will match abcdex:

  1. .{5,} matches abcdex.
  2. x matches nothing.
  3. So backtrack .{5,} and let it match abcde.
  4. Now x matches that last x.

But .{5,}+x will not match abcdex:

  1. .{5,}+ matches abcdex.
  2. x matches nothing.
  3. Cannot backtrack the .{5,}+. We have to stop here.

*: Even the pattern cannot be backtracked, the matched strings can still be deleted as a whole. For instance, a?.{5,}x will match {a?a, .{5,}+bcdex, x → no match}, and then delete the whole .{5,}+ and a and restart with {a?, .{5,}+abcdex, x → no match}. Therefore, we can also say that the + makes the quantifier "atomic".


On the other hand, + alone just mean {1,}, i.e. match one or more times.


Any character, 5 or more times.

  • "." means any character except a line break.
  • {m, n} defines a bounded interval. "m" is the min. "n" is the max. If n is not defined, as is here, it is unlimited.
  • "+" means possessive.


.{5,}+ means

  1. Match any single character that is not a line break character
    1. Between 5 and unlimited times; as many times as possible, without giving back (possessive)

.+ is the same thing but it matches between 1 and unlimited times, giving back as needed (greedy).

As I've mentioned many times before, I'm a huge fan of RegexBuddy. It's "Create" mode is excellent for deconstruction regular expressions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜