In C# regular expression why does the initial match show up in the groups?

2022-12-20 02:25 问答作者：

So if I write a regex it's matches I can get the match or I can access its groups. This seems counter int开发者_Python百科uitive since the groups are defined in the expression with braces "(" and ")". It seems like it is not only wrong but redundant. Any one know why?

Regex quickCheck = new Regex(@"(\D+)\d+");
string source = "abc123";

m.Value        //Equals source
m.Groups.Count //Equals 2
m.Groups[0])   //Equals source
m.Groups[1])   //Equals "abc"

I agree - it is a little strange, however I think there are good reasons for it.

A Regex Match is itself a Group, which in turn is a Capture.

But the Match.Value (or Capture.Value as it actually is) is only valid when one match is present in the string - if you're matching multiple instances of a pattern, then by definition it can't return everything. In effect - the Value property on the Match is a convenience for when there is only match.

But to clarify where this behaviour of passing the whole match into Groups[0] makes sense - consider this (contrived) example of a naive code unminifier:

[TestMethod]
public void UnMinifyExample()
{
  string toUnMinify = "{int somevalue = 0; /*init the value*/} /* end */";
  string result = Regex.Replace(toUnMinify, @"(;|})\s*(/\*[^*]*?\*/)?\s*", "$0\n");
  Assert.AreEqual("{int somevalue = 0; /*init the value*/\n} /* end */\n", result);
}

The regex match will preserve /* */ comments at the end of a statement, placing a newline afterwards - but works for either ; or } line-endings.

Okay - you might wonder why you'd bother doing this with a regex - but humour me :)

If Groups[0] generated by the matches for this regex was not the whole capture - then a single-call replace would not be possible - and your question would probably be asking why doesn't the whole match get put into Groups[0] instead of the other way round!

The documentation for Match says that the first group is always the entire match so it's not an implementation detail.

It's historical is all. In Perl 5, the contents of capture groups are stored in the special variables $1, $2, etc., but C#, Java, and others instead store them in an array (or array-like structure). To preserve compatibility with Perl's naming convention (which has been copied by several other languages), the first group is stored in element number one, the second in element two, etc. That leaves element zero free, so why not store the full match there?

FYI, Perl 6 has adopted a new convention, in which the first capturing group is numbered zero instead of one. I'm sure it wasn't done just to piss us off. ;)

Most likely so that you can use "$0" to represent the match in a substitution expression, and "$1" for the first group match, etc.

I don't think there's really an answer other than the person who wrote this chose that as an implementation detail. As long as you remember that the first group will always equal the source string you should be ok :-)

Not sure why either, but if you use named groups you can then set the option RegExOptions.ExplicitCapture and it should not include the source as first group.

It might be redundant, however it has some nice properties.

For example, it means the capture groups work the same way as other regex engines - the first capture group corresponds to "1", and so on.

Backreferences are one-based, e.g., \1 or $1 is the first parenthesized subexpression, and so on. As laid out, one maps to the other without any thought.

Also of note: m.Groups["0"] gives you the entire matched substring, so be sure to skip "0" if you're iterating over regex.GetGroupNames().

继续阅读：regex

In C# regular expression why does the initial match show up in the groups?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？