C# regex: negative lookahead fails with the single line option
I am trying to fig开发者_StackOverflow中文版ure out why a regex with negative look ahead fails when the "single line" option is turned on.
Example (simplified):
<source>Test 1</source>
<source>Test 2</source>
<target>Result 2</target>
<source>Test 3</source>
This:
<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
will fail if the single line option is on, and will work if the single line option is off. For instance, this works (disables the single line option):
(?-s:<source>(?!.*<source>)(.*?)</source>(?!\s*<target))
My understanding is that the single line mode simply allows the dot "." to match new lines, and I don't see why it would affect the expression above.
Can anyone explain what I am missing here?
::::::::::::::::::::::
EDIT: (?!.*) is a negative look ahead not a capturing group.
<source>(?!.*?<source>)(.*?)</source>(?!\s*<target)
will ALSO FAIL if the single line mode is on, so it doesn't look like this is a greediness issue. Try it in a Regex designer (like Expresso or Rad regex):
With single line OFF, it matches (as expected):
<source>Test 1</source>
<source>Test 3</source>
With single line ON:
<source>Test 3</source>
I don't understand why it doesn't match the first one as well: it does not contain the first negative look ahead, so it should match the expression.
I believe this is what you're looking for:
<source>((?:(?!</?source>).)*)</source>(?!\s*<target)
The idea is that you match each character one at a time, but only after making sure it isn't the first character of </source>
. Also, with the addition of /?
to the lookahead, you don't have to use a non-greedy quantifier.
The reason why it "fails" is because you seem to have misplaced the negative lookahead.
<source>(?!.*<source>)(.*?)</source>(?!\s*<target)
^^^^^^^^^^^^^^
Now, let's consider what (?!.*<source>)
does here: it's a lookahead that says that there is NO match for .*<source>
from that position.
Well, in single-line mode, .
matches everything. After matching the first two <source>
, there IS in fact .*<source>
! So the negative lookahead fails for the first two <source>
.
On the last <source>
, .*<source>
no longer match, so the negative lookahead succeeds. The rest of the pattern also succeeds, and that's why you only get <source>Test 3</source>
in single-line mode.
精彩评论