开发者

Preg_match differences?

i want to ask, what is the meaning or difference between these two line?

  1. if( preg_match_all('/\#([א-תÀ-ÿ一-龥а-яa-z0-9\-_]{1,50})/iu', $message, $matches, PREG_PATTERN_ORDER) ) {

  2. if( preg_match_all('/\#([а-яa-z0-9\-_\x{4e00}-\x{9fa5}]{1,50})/iu', $message, $matches, PREG_PATTERN_ORDER) ) {

and what does the number 3 mean in this line? (Arrow开发者_如何学C pointing)

if( preg_match_all('/\@([a-zA-Z0-9\-_\x{4e00}-\x{9fa5}]{->3,30})/u', $message, $matches, PREG_PATTERN_ORDER) ) {

Thanks!


I'll answer the 2nd part of your question:

The {3,30} in the regex means quantifier for a min of 3 and a max of 30 repetitions.

  • a* means zero or more a
  • a+ means one or more a
  • a? means zero or one a
  • a{1} means exactly one a same as just a
  • a{1,} means one or more a same as a+
  • a{1,3} means min of one and max of 3 a's

you can have any complex regex in place of a. Example: [a-zA-Z]{3,30} would mean at least 3 and at max 30 of any of the alphabets.


Your first regex includes Hebrew and accented Latin characters (and possibly others) that the 2nd regex does not include.


The second expression uses Unicode syntax to match Unicode characters.

\x{FFFF} where FFFF are 1 to 4 hexadecimal digits
Perl syntax to match a specific Unicode code point. Can be used inside character classes.

Example:
\x{E0} matches à encoded as U+00E0 only.
\x{A9} matches ©

Thus it tries to match every Unicode character from U+4e00 to U+9fa5 (from 一 to 龥) whereas the last one is not a valid Unicode character.


The first expressions also tries to match these characters (一-龥) but they are not expressed in the Unicode syntax (whether or not this opposes a problem I don't know). In addition (as already mentioned) the first expression matches more characters, namely א-ת and À-ÿ.


The second question was already very well answered by unicornaddict.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜