How to properly escape characters in regexp
I want to do a string search inside a string. Simply saying MySTR.search(Needle)
.
The problem occurs when this needle
string contains special regex characters like *,+ and so on. It fails with error invalid quantifier
.
I have browsed the web and found out that string can be escaped with \Q some string开发者_运维技巧 \E
.
However, this does not always produce the desired behavior. For example:
var sNeedle = '*Stars!*';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');
Result is -1. OK.
var sNeedle = '**Stars!**';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');
Result is "invalid quantifier". This happens because 2 or more special characters are 'touching' each other, because:
var sNeedle = '*Dont touch me*Stars!*Dont touch me*';
var sMySTR = 'The contents of this string have no importance';
sMySTR.search('\Q' + sNeedle + '\E');
Will work OK.
I know I could make a function escapeAllBadChars(sInStr)
and just add double slashes before every possible special regex character, but I'm wondering if there is a simpler way to do it?
\Q...\E
doesn't work in JavaScript (at least, they don't escape anything...) as you can see:
var s = "*";
print(s.search(/\Q*\E/));
print(s.search(/\*/));
produces:
-1
0
as you can see on Ideone.
The following chars need to be escaped:
(
)
[
{
*
+
.
$
^
\
|
?
So, something like this would do:
function quote(regex) {
return regex.replace(/([()[{*+.$^\\|?])/g, '\\$1');
}
No, ]
and }
don't need to be escaped: they have no special meaning, only their opening counter parts.
Note that when using a literal regex, /.../
, you also need to escape the /
char. However, /
is not a regex meta character: when using it in a RegExp
object, it doesn't need an escape.
I'm just dipping my feet in Javascript, but is there a reason you need to use the regex engine at all? How about
var sNeedle = '*Stars!*';
var sMySTR = 'The contents of this string have no importance';
if ( sMySTR.indexOf(sNeedle) > -1 ) {
//found it
}
I performed a quick Google search to see what's out there and it appears that you've got a few options for escaping regular expression characters. According to one page, you can define & run a function like below to escape problematic characters:
RegExp.escape = function(text) {
return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
Alternatively, you can try and use a separate library such as XRegExp, which already handles nuances you're trying to re-solve.
Duplicate of https://stackoverflow.com/a/6969486/151312
This is proper as per MDN (see explanation in post above):
function escapeRegExp(str) {
return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
精彩评论