Regex: Does not have/include pattern
I have a regex pattern to match an HTML script tag. How can I change this script tag pattern so that the patterns means "input string DOES NOT MATCH" the script tag pattern?
In other words, given 开发者_运维技巧a pattern, what is the alteration needed to change the meaning of the pattern to "does not match this pattern"?
For example, if I have a pattern: \d{3}-\d{3}-\d{4}
, what is the equivalent pattern for this that means "does not match \d{3}-\d{3}-\d{4}
"?
You can negate a regex pattern by using a negative lookahead. This is slightly different than simply negating the regex though. Negative lookahead would look like the following in Java (and many other languages):
(?!\d{3}-\d{3}-\d{4})
It should be noted that this doesn't exactly answer the question. Finding the inverse of a regular language is not an easy task using a regular expression (I don't think). A much easier way to solve the problem would be to inverse the program logic:
Instead of:
if (string.matches(yourRegex))
Do:
if (!string.matches(yourRegex))
That is not easily achievable for arbitrary patterns. In practice, it's almost always easier to do what you want in the surrounding code than in the pattern itself. For instance, instead of
grep '\d{3}-\d{3}-\d{4}' file
you could use
grep -v '\d{3}-\d{3}-\d{4|' file
Or in a program you could change something like
if (pattern.matches()) {
foo();
}
into something like
if (!pattern.matches()) {
foo();
}
In a more tedious approach, you would have to enumerate all possible values that should match instead of what should not match. So, say you want to match everything but the string <html>
, you could write a regex like so:
([^<]|<([^h]|h([^t]|t([^m]|m([^l]|l[^>])))))
Reading that regex is like saying: "Okay, you can match any character but '<', or you could match '<' but then you can't match an 'h' after that... or you do match an 'h' after that but then you can't match a 't' after that... and so on.
It's butt ugly, but then again, for simple string matches, you can easily write a recursive function that transforms any given term into a pattern like the above.
easier to just negate the test surely? eg...
if (!regex.test(str)) ...
(javascript example)
Negating a character class is easy with ^
but a whole regex will get much more convoluted.
What language are you using? The easiest solution to the specific problem you stated is to simply prepend a negation operator (usually "!") to the match.
I definitely agree with the other answers saying you should negate testing for a match, but this should do what you want using just a regex:
(?!.*\d{3}-\d{3}-\d{4})
This is a negative lookahead, by not placing any characters outside of the lookahead the regex basically means "fail on any string that starts with any number of characters (.*
) followed by the regex \d{3}-\d{3}-\d{4}
".
精彩评论