How do conditionals in lookaround groups work in .NET regex?
Playing around with regular expressions, especially the balanced matching of the .NET flavor, I came to a point where I realized that I do not understand the inner workings of the engine as good as I thought I did. I'd appriciate any input on why my patterns behave the way they do! But fist...
Disclaimer: This question is purely theoretical, and any result obtained here will never be used, or modified and used in production code to parse HTML. Ever. I promise. I do fear the pony. =)
Now to my problem. I'll try to match the letter A
, if it is not preceeded by an #
. To demonstrate, I'll alway use the string ..A..#..A..
. Here, the first A
should be matched. Of course, this is a quite easy task by using "A(?<!^.*#.*)"
, but I wish to use conditionals here, since they can be used for balanced matchings and other cool things.
What I tried is
"A(?<=^(#(?<q>)|[^#])*(?(q)(?!)))"
The way I interpret it is: when the engine encounteres an "A", it goes back to the start of the string, and for every character add an empty match to the capturing group q if the character is a #. Then it should fail if q contains a match. What I don't understand is why this expression matches both As in my s开发者_StackOverflowample string.
When I simply remove the lookbehind and match the whole string, this works:
"^(#(?<q>)|[^#])*(?(q)(?!))A"
matches the whole string up to the first A, even if the first group's quantifier is greedy. Inserting a '#' at the beginning will also cause the match to fail (as desired).
So: how do look around groups, named capturing groups within them and conditionals play together?
Thanks!
Edit: This problem can be seen more easily in (?<=(?<q>)(?(q)(?!))).
, which should not match any character, but matches everything.
Conditionals aren't really that useful in balanced matching--or anywhere else, for that matter. ;) Balanced matching works by using a named capture group as a stack; every time that group matches something, the matched text is pushed onto the stack. There's also special syntax for popping the stack. Here's a good introduction:
http://blog.stevenlevithan.com/archives/balancing-groups
精彩评论