开发者

Problem with RegEx OR operator in C#

I want to match a pattern [0-9][0-9]KK[a-z][a-z] which is not prece开发者_运维知识库ded by either of these words

  • http://

  • example

I have a RegEx which takes care of the first criteria, but not the second criteria.

Without OR operator

var body = Regex.Replace(body, "(?<!http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%

\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?)([0-9][0-9]KK[a-z][a-z])

(?!</a>)","replaced");

wth OR Operator

var body = Regex.Replace(body, "(?example)|(?<!http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@

\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?)([0-9][0-9]KK[a-

z][a-z])(?!</a>)","replaced");

The second one with OR operator throws an exception. How can I fix this?

It should not match either of these:

  • example99KKas

  • http://stack.com/99KKas


Here is one way to do it. Start at the beginning of the string and check that each character is not the start of 'http://' or 'example'. Do this lazily, and one character at a time so that we can spot the magic word once we reach it. Also, capture everything up to the magic word so that we can put it back in the replacement string. Here it is in commented free-spacing mode so that it can be comprehended by mere mortals:

var body = Regex.Replace(body, 
    @"# Match special word not preceded by 'http://' or 'example'
    ^                           # Anchor to beginning of string
    (?i)                        # Set case-insensitive mode.
    (                           # $1: Capture everything up to  special word.
      (?:                       # Non-capture group for applying * quantifier.
        (?!http://)             # Assert this char is not start of 'http://'
        (?!example)             # Assert this char is not start of 'example'
        .                       # Safe to match this one acceptable char.
      )*?                       # Lazily match zero or more preceding chars.
    )                           # End $1: Everything up to  special word.
    (?-i)                       # Set back to case-sensitive mode.
    ([0-9][0-9]KK[a-z][a-z])    # $2: Match our special word.
    (?!</a>)                    # Assert not end of Anchor tag contents.
    ", 
    "$1replaced",
    RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);

Note that this is case sensitive for the magic word but not for http:// and example. Note also that this is untested (I don't know C# - just its regex engine). The "var" in "var body = ..." looks kinda suspicious to me. ??


I wasn't able to get the second example working, it gave an ArgumentException of "Unrecognized grouping construct".

But I replaced the url matching and moved the first alternative group a bit and came up with this:

var body = Regex.Replace(body, "(?<!http\\://[a-zA-Z0-9\\-\\.]+\\.[a-zA-Z]{2,3}(/\\S*)?|example)
([0-9][0-9]KK[a-z][a-z])(?!</a>)","replaced");


You could use something like this:

body = Regex.Replace(body, @"(?<!\S)(?!(?i:http://|example))\S*\d\dKK[a-z]{2}\b", "replaced");
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜