A regex to match a substring that isn't followed by a certain other substring
I need a regex that will match blahfooblah
but not blahfoobarblah
I want it to match only foo and everything around foo, as long as it isn't followed by bar.
I tried using this: foo.*(?<!bar)
which is fairly close, but it matches blahfoobarblah
. The negative look behind needs to match any开发者_开发百科thing and not just bar.
The specific language I'm using is Clojure which uses Java regexes under the hood.
EDIT: More specifically, I also need it to pass blahfooblahfoobarblah
but not blahfoobarblahblah
.
Try:
/(?!.*bar)(?=.*foo)^(\w+)$/
Tests:
blahfooblah # pass
blahfooblahbarfail # fail
somethingfoo # pass
shouldbarfooshouldfail # fail
barfoofail # fail
Regular expression explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
bar 'bar'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
foo 'foo'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Other regex
If you only want to exclude bar
when it is directly after foo
, you can use
/(?!.*foobar)(?=.*foo)^(\w+)$/
Edit
You made an update to your question to make it specific.
/(?=.*foo(?!bar))^(\w+)$/
New tests
fooshouldbarpass # pass
butnotfoobarfail # fail
fooshouldpassevenwithfoobar # pass
nofuuhere # fail
New explanation
(?=.*foo(?!bar))
ensures a foo
is found but is not followed directly bar
To match a foo
following by something that doesn't start with bar
, try
foo(?!bar)
Your version with negative lookbehind is effectively "match a foo
followed by something that doesn't end in bar
". The .*
matches all of barblah
, and the (?<!bar)
looks back at lah
and checks that it doesn't match bar
, which it doesn't, so the whole pattern matches.
Use a negative look ahead instead:
\s*(?!\w*(bar)\w*)\w*(foo)\w*\s*
This worked for me, hope it helps. Good luck!
You wrote a comment suggesting you like this to work matching all words in a string rather than the whole string itself.
Rather than mashing all of this in a comment, I'm posting it as a new answer.
New Regex
/(?=\w*foo(?!bar))(\w+)/
Sample text
foowithbar fooevenwithfoobar notfoobar foohere notfoobarhere butfooisokherebar notfoobarhere andnofuu needsfoo
Matches
foowithbar fooevenwithfoobar foohere butfooisokherebar needsfoo
Your specific match request can be matched by:
\w+foo(?!bar)\w+
This will match blahfooblahfoobarblah
but not blahfoobarblahblah
.
The problem with your regex of foo.*(?<!bar)
is the .*
after foo
. It matches as many of any characters including characters after bar
.
精彩评论