开发者

Regex to match a string NOT surrounded by brackets

I have to parse a text where with is a key word if it is not surrounded by square brackets. I have to match the keyword with. Also, there must be word boundaries on both side of with.

Here are some examples where with is NOT a keyword:

  • [with]
  • [ with ]
  • [sometext with sometext]
  • [sometext with]
  • [with sometext]

Here are some examples where with IS keyword开发者_StackOverflow

  • with
  • ] with
  • hello with
  • hello with world
  • hello [ world] with hello
  • hello [ world] with hello [world]

Anyone to help? Thanks in advance.


You can look for the word with and see that the closest bracket to its left side is not an opening bracket, and that the closest bracket to its right side is not a closing bracket:

Regex regexObj = new Regex(
    @"(?<!     # Assert that we can't match this before the current position:
     \[        #  An opening bracket
     [^[\]]*   #  followed by any other characters except brackets.
    )          # End of lookbehind.
    \bwith\b   # Match ""with"".
    (?!        # Assert that we can't match this after the current position:
     [^[\]]*   #  Any text except brackets
     \]        #  followed by a closing bracket.
    )          # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    // matched text: matchResults.Value
    // match start: matchResults.Index
    // match length: matchResults.Length
    matchResults = matchResults.NextMatch();
}

The lookaround expressions don't stop at line breaks; if you want each line to be evaluated separately, use [^[\]\r\n]* instead of [^[\]]*.


Nice question. I think it'll be easier to find the matches where your [with] pattern applies, and then inverse the result.

You need to match [, not followed by ], followed by with (and then the corresponding pattern for closed square bracket)

Matching the [ and the with are easy.

\[with

add a lookahead to exclude ], and also allow any number of other characters (.*)

\[(?!]).*with

then the corresponding closed square bracket, i.e. the reverse with a lookbehind.

\[(?!]).*with.*\](?<1[)

a bit more tweaking

\[(?!(.*\].*with)).*with.*\](?<!(with.*\[.*))

and now if you inverse this, you should have your desired result. (i.e. when this returns 'true', your pattern matches and want to exclude those results).


I think the simplest solution is to preemptively match balanced pairs of brackets and their contents to get them out of the way as you search for the keyword. Here's an example:

string s = 
  @"[with0]
  [ with0 ]
  [sometext with0 sometext]
  [sometext with0]
  [with0 sometext]


  with1
  ] with1
  hello with1
  hello with1 world
  hello [ world] with1 hello
  hello [ world] with1 hello [world]";

Regex r = new Regex(@"\[[^][]*\]|(?<KEYWORD>\bwith\d\b)");
foreach (Match m in r.Matches(s))
{
  if (m.Groups["KEYWORD"].Success)
  {
    Console.WriteLine(m.Value);
  }
}


You'll want to look into both negative look-behinds and negative look-aheads, this will help you match your data without consuming the brackets.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜