开发者

.NET Regex for "not this string"

I'm a regex newbie and need a single expression that:

matches the "an" and the "AN" but not the "and" or "AND" and matches the "o" and the "O" but not the "or" or "OR" in this predicate:

1and(2or3)AND(4OR5)an(6o7)AN(8O9)

Basically I can't figure out how to convert the expression:

var myRegEx = Regex("[0-9 ()]|AND|OR")

into a "everything but", case insensitive expression.

Can't use the regex word boundaries feature because the predicate isn't required to have spaces.

(Added after two answers were already provided): I also need to know the index of the match, which is why I'm assuming I need to use the Regex.Match() method.

Thanks!

Here's what I ended up with:

  private bool mValidateCharacters()
  {
     const string legalsPattern = @"[\d ()]|AND|OR";
     const string splitPattern = "(" + legalsPattern + ")";
     int position = 0;
     string[] tokens = Regex.Split(txtTemplate.Text, splitPattern, RegexOptions.IgnoreCase);

     // Array contains every legal operator/symbol found in the entry field
     //开发者_运维问答 and every substring preceeding, surrounded by, or following those operators/symbols
     foreach (string token in tokens)
     {
        if (string.IsNullOrEmpty(token))
        {
           continue;
        }

        // Determine if the token is a legal operator/symbol or a syntax error
        Match match = Regex.Match(token, legalsPattern, RegexOptions.IgnoreCase);

        if (string.IsNullOrEmpty(match.ToString()))
        {
           const string reminder =
              "Please use only the following in the template:" +
              "\n\tRow numbers from the terms table" +
              "\n\tSpaces" +
              "\n\tThese characters: ( )" +
              "\n\tThese words: AND OR";
           UserMsg.Tell("Illegal template entry '" + token + "'at position: " + position + "\n\n" + reminder, UserMsg.EMsgType.Error);
           txtTemplate.Focus();
           txtTemplate.Select(position, token.Length);
           return false;
        }

        position += token.Length;
     }

     return true;
  }


Randal Schwartz's rule: Use capturing in Regex.Match when you know what you want to keep, and use Regex.Split when you know what you want to throw away.

You wrote you want “everything but,” so

var input = "1and(2or3)AND(4OR5)an(6o7)AN(8O9)";
foreach (var s in Regex.Split(input, @"[\d()]|AND|OR", RegexOptions.IgnoreCase))
  if (s.Length > 0)
    Console.WriteLine("[{0}]", s);

Output:

[an]
[o]
[AN]
[O]

To get the offsets, save the separators by enclosing the regular expression in parentheses:

var input = "1and(2or3)AND(4OR5)an(6o7)AN(8O9)";
string pattern = @"([\d()]|AND|OR)";
int offset = 0;
foreach (var s in Regex.Split(input, pattern, RegexOptions.IgnoreCase)) {
  if (s.ToLower() == "an" || s.ToLower() == "o")
    Console.WriteLine("Found [{0}] at offset {1}", s, offset);
  offset += s.Length;
}

Output:

Found [an] at offset 19
Found [o] at offset 23
Found [AN] at offset 26
Found [O] at offset 30
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜