开发者

Elegant way to do "multiple passes" in a regex? Or make my regex smarter?

I have a list of words I want to strip from a string. I tried doing it like so:

var original = "X of the Y";
var result = original.replace(/\Wthe\W|\Wof\W|\Wat\W|\W+/g, " ");

// now result === "X the Y", but I wanted resul开发者_JS百科t === "X Y"

I realize I could solve this by looping, doing the replacement until the regex test returns zero matches. But I feel like if I just wrote a cleverer regex, or maybe passed some esoteric flag, I'd be fine. Any ideas?


That's Javascript right? The only reason your regex doesn't behave the way you want is because of the \W. It searches for matches in order. But since you have \W around each word it will match a non-word character. In this case, spaces. So the first match is of (note the spaces on both sides) and then it continues searching, but there are no more matches since the string the Y doesn't have any match because there is no non-word character before the. If you change your \W to \b (which matches the empty string at a word boundary, it will work the way you want:

var original = "X of the Y";
var result = original.replace(/\b(the|of|at)\b\s*/g, "");

// Now result = "X Y"

Justin commented suggested I take the \b out of the parenthesis, which makes sense. It's nicer to read, more concise, and technically slightly faster for the regex engine to execute.

I also changed the \W at the end to \s* to match white-space, and replaced the matches with the empty string instead of a space, so that each word leaves the spaces that were in front of them, but deletes the spaces that are after. Meaning that if each word is separated by one space to begin with, the result will have one space between each word too.


You're trying to match the same space character in two different places.

Instead, you can match a sequence of zero or more words that are each preceeded by whitespace, with more whitespace after the entire sequence:

This way, if you have two consecutive words, the space after the first word will be matched by the \W before the second word.

Like this:

original.replace(/(\W+the|\W+of|\W+at)*\W+/g, " "); 

Note that you probably want /gi to make the regex case-insensitive.


You could probably do a replace using this:

/(?:^|\s+)(the|of|at)(?=\s+|$)/g

if you can do assertions. Replace with ''.
Since this is replacing the previous spaces plus the word with nothing, there may be an unwanted space at the begining of the string.

That can be removed with another replacement regex: /^\s+/
replace with ''.


how about something like:

var original = "X of the Y";
var result = original.replace(/(the|of|at|\W)+/g, " ");

this results in "X Y"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜