Getting an array of matches and plain strings from a JavaScript regular expression
I often want to parse a string with a regular expression and find all the matches plus all the non-matching strings, and all interspersed in their original order, e.g.
var parsed = regexParse(/{([^}]+)}/g, 'Hello {name}, you are {age} years old');
And so parsed
will contain:
0 : "Hello "
1 : match containing {name}, name
2 : ", you are "
3 : match containing {age}, age
4 : " years old"
Is there anything in JavaScript (or some widely used library) that resembles this regexParse
function? I wrote my own version of it, but it seems so obvious that I'm suspicious that there must already be a "standard" way of doing it:
var regexParse = function(rx, str) {
var nextPlain = 0, result = [], match;
rx.lastIndex = 0;
for (;;) {
match = rx.exec(str);
if (!match) {
result.push(str.substr(nextPlain));
break;
}
result.push(str.substr(nextPlain, match.index - nextPlain));
nextPlain = rx.lastIndex;
result.push(match);
}
return result;
};
Update
Regarding Dennis's answer, at first I thought it was going to fail to help because all the values in the returned array are strings. How can you tell which items are unmatched text and which are from the matches?
But a bit of experimentation (with IE9 and Chrome anyway) suggests that when split
is used in this way, it always alternates the pieces, so that the first is from plain text, the second is a match, the third is plain text, and so on. It follows this rule even if there are two matches with no unmatched t开发者_如何学Pythonext interspersed - it outputs an empty string in such cases.
Even in the trivial case:
'{x}'.split(/{([^}]+)}/g)
The output is strictly:
["", "x", ""]
So you can tell which is which if you know how (and if this assumption holds)!
I like to use the ES5 array methods map
, forEach
and filter
. So with my original regexParse
it was a matter of using typeof i == 'string
to detect which items were unmatched text.
With split
it has to be determined from the position in the returned array, but that's okay because the ES5 array methods pass a second argument, the index, and so we just need to find out if it's odd (a match) or even (unmatched text). So for example, if we have:
var ar = '{greeting} {name}, you are {age} years old'.split(/{([^}]+)}/g);
Now ar
contains:
["", "greeting", " ", "name", ", you are ", "age", " years old"]
From that we can get just the matches:
ar.filter(function(s, i) { return i % 2 != 0; });
>>> ["greeting", "name", "age"]
Or just the plain text, stripping out empty strings also:
ar.filter(function(s, i) { return (i % 2 == 0) && s; });
>>> [" ", ", you are ", " years old"]
I think you're looking for split()
with capturing parenthesis:
var myString = "Hello 1 word. Sentence number 2.";
var splits = myString.split(/(\d)/); // Hello ,1, word. Sentence number ,2, .
精彩评论