Regex PHP Only Match if Not Surrounded By Quotes
I have some regex I run over an entire HTML page looking for strings and replacing them, however if the string is in single or double quotes I do not want it to match.
Current Regex: ([a-zA-Z_][a-zA-Z0-9_]*)
I would like to match steve
,john
,cathie
and john likes to walk
(x3)
but not "steve"
, 'sophie'
or "john"'likes'"cake"
I have tried 开发者_运维百科(^")([a-zA-Z_][a-zA-Z0-9_]*)(^")
but get no matches?
Test Cases:
(steve=="john") would return steve
("test"=="test") would not return anything
(boob==lol==cake) would return all three
Try this one:
(\b(?<!['"])[a-zA-Z_][a-zA-Z_0-9]*\b(?!['"]))
Against this string:
john "michael" michael 'michael elt0n_john 'elt0n_j0hn' 1 2 3 4 5 6
It would match nr 1 john
, nr 3 Michael
and nr 5 elt0n_john
To do that you probably need some dark magic:
'~(?:"[^"\\\\]*+(?:\\\\.[^"\\\\]*+)*+"|\'[^\'\\\\]*+(?:\\\\.[^\'\\\\]*+)*+\')(*SKIP)(*F)|([a-zA-Z_][a-zA-Z0-9_]*)~'
The (?:"[^"\\\\]*+(?:\\\\.[^"\\\\]*+)*+"|\'[^\'\\\\]*+(?:\\\\.[^\'\\\\]*+)*+\')
part matches a string in either single or double quotes and implements backslash-escaping. The (*SKIP)(*F)
skips the quoted string and forces a fail. ([a-zA-Z_][a-zA-Z0-9_]*)
is your regex.
PS: If you are using this on PHP scripts, you may want to use the Tokenizer instead. That way you could for example exclude keywords (like class
or abstract
, I don't know whether you need this) and you will have much better handling of edge cases (like HEREDOC).
You could try with:
preg_match_all('#(?<!["\']) \b \w+ \b (?!["\'])#x', $str, $matches);
The \w+
matches word characters, but would allow 0123sophie
for example. The \b
matches word boundaries and thus ensures that the anti-quote assertions do not terminate too early.
However, this regex will also fail to find words which have just a single quote "before or after' them.
Pez, resurrecting this ancient question because the current answer is not quite correct (and I'm not sure any solution can be).
It will fail to match john
when it is in incomplete quotes, for instance in "john
, john"
, 'john
and john'
(situations that can happen with john's birthday
etc. See this demo.
This alternate solution just skips any content in quotes:
(?:'[^'\n]*'|"[^"\n]*")(*SKIP)(*F)|\b[a-zA-Z_][a-zA-Z_0-9]*\b
See demo
Either way, with quotes, no solution is perfect because you always run the risk of having unbalanced quotes. In this case I have tried to mitigate the problem by assuming that if it's on another line, it's a different string.
Reference
- How to match pattern except in situations s1, s2, s3
- How to match a pattern unless...
Ok I think I have it and it works for your test cases:
(?<!"|'|\w)(\w+)(?!"|'|\w)
Done with look-ahead/look-behind regex feature.
精彩评论