Regex: match everything but a specific pattern
I need a regular expression able to match everything but a string starting with a sp开发者_JS百科ecific pattern (specifically index.php
and what follows, like index.php?id=2342343
).
Regex: match everything but:
- a string starting with a specific pattern (e.g. any - empty, too - string not starting with
foo
):- Lookahead-based solution for NFAs:
^(?!foo).*$
^(?!foo)
- Lookahead-based solution for NFAs:
- Negated character class based solution for regex engines not supporting lookarounds:
^(([^f].{2}|.[^o].|.{2}[^o]).*|.{0,2})$
^([^f].{2}|.[^o].|.{2}[^o])|^.{0,2}$
- a string ending with a specific pattern (say, no
world.
at the end):- Lookbehind-based solution:
(?<!world\.)$
^.*(?<!world\.)$
- Lookahead solution:
^(?!.*world\.$).*
^(?!.*world\.$)
- POSIX workaround:
^(.*([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.])|.{0,5})$
([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.]$|^.{0,5})$
- Lookbehind-based solution:
- a string containing specific text (say, not match a string having
foo
):- Lookaround-based solution:
^(?!.*foo)
^(?!.*foo).*$
- POSIX workaround:
- Use the online regex generator at www.formauri.es/personal/pgimeno/misc/non-match-regex
- Lookaround-based solution:
- a string containing specific character (say, avoid matching a string having a
|
symbol):^[^|]*$
- a string equal to some string (say, not equal to
foo
):- Lookaround-based:
^(?!foo$)
^(?!foo$).*$
- POSIX:
^(.{0,2}|.{4,}|[^f]..|.[^o].|..[^o])$
- Lookaround-based:
- a sequence of characters:
- PCRE (match any text but
cat
):/cat(*SKIP)(*FAIL)|[^c]*(?:c(?!at)[^c]*)*/i
or/cat(*SKIP)(*FAIL)|(?:(?!cat).)+/is
- Other engines allowing lookarounds:
(cat)|[^c]*(?:c(?!at)[^c]*)*
(or(?s)(cat)|(?:(?!cat).)*
, or(cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*
) and then check with language means: if Group 1 matched, it is not what we need, else, grab the match value if not empty
- PCRE (match any text but
- a certain single character or a set of characters:
- Use a negated character class:
[^a-z]+
(any char other than a lowercase ASCII letter) - Matching any char(s) but
|
:[^|]+
- Use a negated character class:
Demo note: the newline \n
is used inside negated character classes in demos to avoid match overflow to the neighboring line(s). They are not necessary when testing individual strings.
Anchor note: In many languages, use \A
to define the unambiguous start of string, and \z
(in Python, it is \Z
, in JavaScript, $
is OK) to define the very end of the string.
Dot note: In many flavors (but not POSIX, TRE, TCL), .
matches any char but a newline char. Make sure you use a corresponding DOTALL modifier (/s
in PCRE/Boost/.NET/Python/Java and /m
in Ruby) for the .
to match any char including a newline.
Backslash note: In languages where you have to declare patterns with C strings allowing escape sequences (like \n
for a newline), you need to double the backslashes escaping special characters so that the engine could treat them as literal characters (e.g. in Java, world\.
will be declared as "world\\."
, or use a character class: "world[.]"
). Use raw string literals (Python r'\bworld\b'
), C# verbatim string literals @"world\."
, or slashy strings/regex literal notations like /world\./
.
You could use a negative lookahead from the start, e.g., ^(?!foo).*$
shouldn't match anything starting with foo
.
You can put a ^
in the beginning of a character set to match anything but those characters.
[^=]*
will match everything but =
Just match /^index\.php/
, and then reject whatever matches it.
In Python:
>>> import re
>>> p='^(?!index\.php\?[0-9]+).*$'
>>> s1='index.php?12345'
>>> re.match(p,s1)
>>> s2='index.html?12345'
>>> re.match(p,s2)
<_sre.SRE_Match object at 0xb7d65fa8>
精彩评论