开发者

Regular Expression to match ([^>(),]+) but include some \w's in it?

I'm using php's preg_replace function, and I have the following regex:

(?:[^>(),]+) 

to match any characters but >(),. The problem is that I want to make sure that there is at least one letter in it (\w) an开发者_运维技巧d the match is not empty, how can I do that?

Is there a way to say what i DO WANT to match in the [^>(),]+ part?


You can add a lookahead assertion:

(?:(?=.*\p{L})[^>(),]+)

This makes sure that there will be at least one letter (\p{L}; \w also matches digits and underscores) somewhere in the string.

You don't really need the (?:...) non-capturing parentheses, though:

(?=.*\p{L})[^>(),]+

works just as well. Also, to ensure that we always match the entire string, it might be a good idea to surround the regex with anchors:

^(?=.*\p{L})[^>(),]+$

EDIT:

For the added requirement of not including surrounding whitespace in the match, things get a little more complicated. Try

^(?=.*\p{L})(\s*)((?:(?!\s*$)[^>(),])+)(\s*)$

In PHP, for example to replace all those strings we found with REPLACEMENT, leaving leading and trailing whitespace alone, this could look like this:

$result = preg_replace(
    '/^          # Start of string
    (?=.*\p{L})  # Assert that there is at least one letter
    (\s*)        # Match and capture optional leading whitespace  (--> \1)
    (            # Match and capture...                           (--> \2)
     (?:         # ...at least one character of the following:
      (?!\s*$)   # (unless it is part of trailing whitespace)
      [^>(),]    # any character except >(),
     )+          # End of repeating group
    )            # End of capturing group
    (\s*)        # Match and capture optional trailing whitespace (--> \3)
    $            # End of string
    /xu', 
    '\1REPLACEMENT\3', $subject);


You can just "insert" \w inside (?:[^>(),]+\w[^>(),]+). So it will have at least one letter and obviously not empty. BTW \w captures digits as well as letters. If you want only letters you can use unicode letter character class \p{L} instead of \w.


How about this:

(?:[^>(),]*\w[^>(),]*)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜