Regular expression explanation required
I came across this regular e开发者_如何学Goxpression which is used to check for alphabetic strings. Can anyone explain how it works to me?
/^\pL++$/uD
Thanks.
\pL+
(sometimes written as \p{L}
) matches one or more Unicode letter(s). I prefer \p{L}
to \pL
because there are other Unicode properties like \p{Lu}
(uppercase letter) that only work with the braces; \pLu
would mean "a Unicode letter followed by the letter u
").
The additional +
makes the quantifier possessive, meaning that it will never relinquish any characters it has matched, even if that means an overall match will fail. In the example regex, this is unnecessary and can be omitted.
^
and $
anchor the match at the start and end of the string, ensuring that the entire string has to consist of letters. Without them, the regex would also match a substring surrounded by non-letters.
The entire regex is delimited by slashes (/
). After the trailing slash, PHP regex options follow. u
is the Unicode option (necessary to handle the Unicode property). D
ensures that the $
only matches at the very end of the string (otherwise it would also match right before the final newline in a string if that string ends in a newline).
Looks like PCRE flavor.
According to RegexBuddy:
Assert position at the beginning of the string «^» A character with the Unicode property “letter” (any kind of letter from any language) «\pL++» Between one and unlimited times, as many times as possible, without giving back (possessive) «++» Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
This looks like Unicode processing.. I found a neat article here that seems to explain \pL the rest are anchors and repetition characters.. which are also explained on this site:
http://www.regular-expressions.info/unicode.html
Enjoy
精彩评论