开发者

RegEx: Correct usage of lookbehind assertion and group definitions

I have the following string:

i:0#.w|domain\x123456

I know about the possibility to group searchterms by using <mysearchterm> and calling it via RegEx.Match(myRegEx).Result("${mysearchtermin}");.

I also know that I can lookbehind assertions like (?<= subexpression) via MSDN. Could someone help me in geting the (including the possibility to search for them via groups as shown before):

开发者_开发技巧
  1. domain ("domain")
  2. user account ("x12345")

I don't need anything from before the pipe character (nor the pipe character itself) - so basically I am interested in domain\x123456.


As others have noted, this can be done without regex, or without lookbehinds. That being said, I can think of reasons you might want them: to write a RegexValidator instead of having to roll up a CustomValidator, for example. In ASP.NET, CustomValidators can be a little longer to write, and sometimes a RegexValidator does the job just fine.

As far as lookbehinds, the main reason you'd want one for something like this is if the target string could contain irrelevant copies of the |domain\x123456 pattern:

foo@bar|domain\x999999 says: 'i:0#.w|domain\x888888i:0#.w|domain\x123456|domain\x000000'

If you only wanted to grab domain\x888888 and domain\x123456 out of that, a lookbehind could be useful. Or maybe you just want to learn about lookbehinds. Anyway, since we only have one sample input, I can only guess at the rules; so perhaps something like this:

@"(?<=[a-z]:\d#\.[a-z]\|)(?<domain>[^\\]*)\\(?<user>x\d+)"

Lookarounds are one of the most subtle and misunderstood features of regex, IMHO. I've gotten a lot of use out of them in preventing false positives, or in limiting the length of matches when I'm not trying to match the entire string (for example, if I want only the 3-digit numbers in blah 1234 123 1234567 123 foo, I can use (?<!\d)\d{3}(?!\d)). Here's a good reference if you want to learn more about named groups and lookarounds.


You can just use the regex @"\|([^\\]+)\\(.+)".
The domain and user will be in groups 1 and 2, respectively.


You don't need regular expressions for that.

var myString = @"i:0#.w|domain\x123456";
var relevantParts = myString.Split('|')[1].Split('\\');

var domain = relevantParts[0];
var user = relevantParts[1];

Explanation: String.Split(separator) returns an array of substrings separated by separator.


If you insist of using regular expressions, this is how you do it with named groups and Match.Result, based on SLaks answer (+1, by the way):

var myString = @"i:0#.w|domain\x123456";
var r = new Regex(@"\|(?<domain>[^\\]+)\\(?<user>.+)");

var match = r.Matches(myString)[0];  // get first match
var domain = match.Result("${domain}");
var user = match.Result("${user}");

Personally, however, I would prefer the following syntax, if you are just extracting the values:

var domain = match.Groups["domain"];
var user = match.Groups["user"];

And you really don't need lookbehind assertions here.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜