开发者

Exclude subexpression from regex in c++

Suppose I was trying to match the following expression using regex.h in C++, and trying to obtain the subexpressions contained:

/^((1|2)|3) (1|2)$/

Suppose it were matched against the string "3 1", the subexpressions would be:

"3 1"
"3"
"1"

If, instead it were matched against the string "2 1", the subexpressions would be:

"2 1"
"2"
"2"
"1"

Which means that, depending on how the first subexpression evaluates, the final one is in a different ele开发者_StackOverflowment in the pmatch array. I realise this particular example is trivial, as I could remove one of the sets of brackets, or grab the last element of the array, but it becomes problematic in more complicated expressions.

Suppose all I want are the top-level subexpressions, the ones which aren't subexpressions of other subexpressions. Is there any way to only get them? Or, alternatively, to know how many subexpressions are matched within a subexpression, so that I can traverse the array irrespective of how it evaluates?

Thanks


There are two common approaches to solving this problem:

  • Named capturing groups: (?P<name>), so you can pull out captured groups explicitly by name.
  • Non-capturing groups, usually: (?: blah), such that the group doesn't become part of the resulting group list, and the rest will remain in the expected order.

It's unclear which regex dialect you're using, so I don't know if it supports either approach, but check out this regex comparison chart.

Turning the (1|2) group into a non-capturing group would look like:

/^((?:1|2)|3) (1|2)$/


I don't know regex.h, but in many regular expression libraries you can use non-capturing parentheses by starting the group with ?:, so this would stop the inner group from becoming an indexed subexpression:

/^((?:1|2)|3) (1|2)$/
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜