Exclude subexpression from regex in c++
Suppose I was trying to match the following expression using regex.h in C++, and trying to obtain the subexpressions contained:
/^((1|2)|3) (1|2)$/
Suppose it were matched against the string "3 1", the subexpressions would be:
"3 1"
"3"
"1"
If, instead it were matched against the string "2 1", the subexpressions would be:
"2 1"
"2"
"2"
"1"
Which means that, depending on how the first subexpression evaluates, the final one is in a different ele开发者_StackOverflowment in the pmatch array. I realise this particular example is trivial, as I could remove one of the sets of brackets, or grab the last element of the array, but it becomes problematic in more complicated expressions.
Suppose all I want are the top-level subexpressions, the ones which aren't subexpressions of other subexpressions. Is there any way to only get them? Or, alternatively, to know how many subexpressions are matched within a subexpression, so that I can traverse the array irrespective of how it evaluates?
Thanks
There are two common approaches to solving this problem:
- Named capturing groups:
(?P<name>)
, so you can pull out captured groups explicitly by name. - Non-capturing groups, usually:
(?: blah)
, such that the group doesn't become part of the resulting group list, and the rest will remain in the expected order.
It's unclear which regex dialect you're using, so I don't know if it supports either approach, but check out this regex comparison chart.
Turning the (1|2) group into a non-capturing group would look like:
/^((?:1|2)|3) (1|2)$/
I don't know regex.h
, but in many regular expression libraries you can use non-capturing parentheses by starting the group with ?:
, so this would stop the inner group from becoming an indexed subexpression:
/^((?:1|2)|3) (1|2)$/
精彩评论