Regex exception
I'd like to have regex that would match every [[
except these starting with some word, ex.:
Match [[DEF
, but not match [[ABC:DEF
.
Thanks for help and sorry for my English.
EDIT:
My regex (Python) is 开发者_如何学编程(\[\[)|(\{\{([Tt]emplate:|)[Cc]ategory)
.
It match every [[
and {{category}}
or {{Template:Category}}
or {{template:category}}
, but I don't want to match [[
if it starting by ex. ABC
. More examples:
Match [[SOMETHING
, but not match [[ABC: SOMETHING
,
Match [[EXAMPLE
, but not match [[ABC: EXAMPLE
.
EDIT2: "define ex. ABC"
I want match every [[
not followed by some string, for example ABC
.
This depends heavily on the regex engine you are using. If I can assume it can handle look-arounds, the regex would probably be \[\[(?!ABC)
for matching two opening brackets not followed by the three characters ABC
.
match every
[[
but don't match[[
if it starting by ex.ABC
Maybe you mean:
\[\[(?!ABC)
...or maybe something more like:
\[\[(?!\w+:)
Finally, after 8 years, here's an easy copy-paste code that should cover every possible case.
Watch out for:
Be careful when using this for "any-word-except", make sure to put
\b
in the theREGEX_BEFORE
part, as you should be doing anyways for finding words.If your regex is really complex, and you need to use this code in two different places in one regex expression, make sure to use
exceptions_group_1
for the first time,exceptions_group_2
for the second time, etc. Read the explanation below to understand this better.
Copy/Paste Code:
In the following regex, ONLY replace the all-caps sections with your regex.
Python regexpattern = r"REGEX_BEFORE(?>(?P<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(exceptions_group_1)always(?<=fail)|)REGEX_AFTER"
Ruby regex
pattern = /REGEX_BEFORE(?>(?<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(<exceptions_group_1>)always(?<=fail)|)REGEX_AFTER/
PCRE regex
REGEX_BEFORE(?>(?<exceptions_group_1>EXCEPTION_PATTERN)|YOUR_NORMAL_PATTERN)(?(exceptions_group_1)always(?<=fail)|)REGEX_AFTER
JavaScript
Impossible as of 6/17/2020, and probably won't be possible in the near future.
Full Examples
REGEX_BEFORE = [[
YOUR_NORMAL_PATTERN = \w+\d*
REGEX_AFTER = ]]
EXCEPTION_PATTERN = MyKeyword\d+
pattern = r"\[\[(?>(?P<exceptions_group_1>MyKeyword\d+)|\w+\d*)(?(exceptions_group_1)always(?<=fail)|)\]\]"
Ruby regex
pattern = /\[\[(?>(?<exceptions_group_1>MyKeyword\d+)|\w+\d*)(?(<exceptions_group_1>)always(?<=fail)|)\]\]/
PCRE regex
\[\[(?>(?<exceptions_group_1>MyKeyword\d+)|\w+\d*)(?(exceptions_group_1)always(?<=fail)|)\]\]
How does it work?
This uses decently complicated regex, namely Atomic Groups, Conditionals, Lookbehinds, and Named Groups.
The
(?>
is the start of an atomic group, which means its not allowed to backtrack: which means, If that group matches once, but then later gets invalidated because a lookbehind failed, then the whole group will fail to match. (We want this behavior in this case).The
(?<exceptions_group_1>
creates a named capture group. Its just easier than using numbers. Note that the pattern first tries to find the exception, and then falls back on the normal pattern if it couldn't find the exception.Note that the atomic pattern first tries to find the exception, and then falls back on the normal pattern if it couldn't find the exception.
The real magic is in the
(?(exceptions_group_1)
. This is a conditional asking whether or not exceptions_group_1 was successfully matched. If it was, then it tries to findalways(?<=fail)
. That pattern (as it says) will always fail, because its looking for the word "always" and then it checks 'does "ways"=="fail"', which it never will.Because the conditional fails, this means the atomic group fails, and because it's atomic that means its not allowed to backtrack (to try to look for the normal pattern) because it already matched the exception.
This is definitely not how these tools were intended to be used, but it should work reliably and efficiently.
Exact answer to the original question:
pattern = r"(\[\[(?>(?P<exceptions_group_1>ABC: )|(SOMETHING|EXAMPLE))(?(exceptions_group_1)always(?<=fail)|))"
精彩评论