开发者

How to ignore regex matches in C#?

An input string:

string datar = "aag, afg, agg, arg";

I am trying to get matches: "aag" and "arg", 开发者_如何学Pythonbut following won't work:

string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";

What is the correct way of ignoring regex matches in C#?


The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :

string regr = "a[a-z](?<!f|g)g"

Explanation :

  • a Match the character "a"
  • [a-z] Match a single character in the range between "a" and "z"
  • (?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
    • f|g Match the character "f" or match the character "g"
  • g Match the character "g"


Character classes aren't quite that fancy. The simple solution is:

a[a-eh-z]g

If you really want to explicitly list out the letters that don't belong, you could try something like:

a[^\W\d_A-Zfg]g

This character class matches everything except:

  1. \W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
  2. \d removes digits so now we have letters and the underscore _.
  3. _ removes the underscore so now we only match letters.
  4. A-Z removes uppercase letters so now we only match lowercase letters.
  5. Finally at this point we can list the individual lowercase letters we don't want to match.

All in all way more complicated than we'd likely ever want. That's regular expressions for ya!


What you're using is Java's set intersection syntax:

a[a-z&&[^fg]]g

..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:

a[a-z-[fg]]g

...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').

Java demo:

String s = "aag, afg, agg, arg, a%g";

Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
  System.out.println(m.group());
}

C# demo:

string s = @"aag, afg, agg, arg, a%g";

foreach (Match m in Regex.Matches(s, @"a[a-z-[fg]]g"))
{
  Console.WriteLine(m.Value);
}

Output of both is

aag
arg


Try this if you want match arg and aag:

a[ar]g

If you want to match everything except afg and agg, you need this regex:

a[^fg]g


It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:

string regr = "a[a-eh-z]g";


Regex: a[a-eh-z]g. Then use Regex.Matches to get the matched substrings.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜