Positive look behind in JavaScript regular expression
I've a document from which I need to extract some data. Document contain strings like these
Text:"How secure is my information?"
I need to extract te开发者_StackOverflow中文版xt which is in double quotes after the literal Text:
How secure is my information?
How do I do this with regex in Javascript
Lookbehind assertions were recently finalised for JavaScript and will be in the next publication of the ECMA-262 specification. They are supported in Chrome 66 (Opera 53), but no other major browsers at the time of writing (caniuse).
var str = 'Text:"How secure is my information?"',
reg = /(?<=Text:")[^"]+(?=")/;
str.match(reg)[0];
// -> How secure is my information?
Older browsers do not support lookbehind in JavaScript regular expression. You have to use capturing parenthesis for expressions like this one instead:
var str = 'Text:"How secure is my information?"',
reg = /Text:"([^"]+)"/;
str.match(reg)[1];
// -> How secure is my information?
This will not cover all the lookbehind assertion use cases, however.
I just want to add something: JavaScript doesn't support lookbehinds like (?<= )
or (?<! )
.
But it does support lookaheads like (?= )
or (?! )
.
You can just do:
/Text:"(.*?)"/
Explanation:
Text:"
: To be matched literally.*?
: To match anything in non-greedy way()
: To capture the match"
: To match a literal"
/ /
: delimiters
string.match(/Text:"([^"]*)"/g)
<script type="text/javascript">
var str = 'Text:"How secure is my information?"';
var obj = eval('({'+str+'})')
console.log(obj.Text);
</script>
If you want to avoid the regular expression all together you can do:
var texts = file.split('Text:"').slice(1).map(function (text) {
return text.slice(0, text.lastIndexOf('"'));
});
Here is an example showing how you can approach this.
1) Given this input string:
const inputText =
`Text:"How secure is my information?"someRandomTextHere
Voice:"Not very much"
Text:"How to improve this?"
Voice:"Don't use '123456' for your password"
Text:"OK just like in the "Hackers" movie."`;
2) Extract data in double quotes after the literal Text:
so that the results is an array with all matches like so:
["How secure is my information?",
"How to improve this?",
"OK just like in the \"Hackers\" movie."]
SOLUTION
function getText(text) {
return text
.match(/Text:".*"/g)
.map(item => item.match(/^Text:"(.*)"/)[1]);
}
console.log(JSON.stringify( getText(inputText) ));
RUN SNIPPET TO SEE A WORKING DEMO
const inputText =
`Text:"How secure is my information?"someRandomTextHere
Voice:"Not very much"
Text:"How to improve this?"
Voice:"Don't use '123456' for your password"
Text:"OK just like in the "Hackers" movie."`;
function getText(text) {
return text
.match(/Text:".*"/g)
.map(item => item.match(/^Text:"(.*)"/)[1]);
}
console.log(JSON.stringify( getText(inputText) ));
If you, like me, get here while researching a bug related to the Cloudinary gem, you may find this useful:
Cloudinary recently released version 1.16.0 of their gem. In Safari, this crashes with the error 'Invalid regular expression: invalid group specifier name'.
A bug report has been filed. In the meantime I reverted to 1.15.0 and the error went away.
Hope this saves someone some lifetime.
A regular expression with lookbehind
regex = /(?<=.*?:).*/g
can be used to produce an array with all matches found in the inputText
(from Piotr Berebecki's answer):
> inputText.match(regex)
[
'"How secure is my information?"someRandomTextHere',
'"Not very much"',
'"How to improve this?"',
`"Don't use '123456' for your password"`,
'"OK just like in the "Hackers" movie."'
]
Each match consists of the quoted string following the first colon in a line.
In the absence of lookbehinds, a regular expression with groups can be used:
regex = /(.*?:)(.*)/g
With this, each match consists of a complete line, with two groups: the first containing the part up to the colon and the second containing the rest.
> inputText.match(regex)
[
'Text:"How secure is my information?"someRandomTextHere',
'Voice:"Not very much"',
'Text:"How to improve this?"',
`Voice:"Don't use '123456' for your password"`,
'Text:"OK just like in the "Hackers" movie."'
]
To see the groups, you must use the .exec
method. The first match looks so:
> [...regex.exec(inputText)]
[
'Text:"How secure is my information?"someRandomTextHere',
'Text:',
'"How secure is my information?"someRandomTextHere'
]
To loop over all matches and process only the second group of each (that is, the part after the colon from each line), use something like:
> for (var m, regex = /(.*?:)(.*)/g; m = regex.exec(inputText); ) console.log(m[2]);
"How secure is my information?"someRandomTextHere
"Not very much"
"How to improve this?"
"Don't use '123456' for your password"
"OK just like in the "Hackers" movie."
精彩评论