开发者

Positive look behind in JavaScript regular expression

I've a document from which I need to extract some data. Document contain strings like these

Text:"How secure is my information?"

I need to extract te开发者_StackOverflow中文版xt which is in double quotes after the literal Text:

How secure is my information?

How do I do this with regex in Javascript


Lookbehind assertions were recently finalised for JavaScript and will be in the next publication of the ECMA-262 specification. They are supported in Chrome 66 (Opera 53), but no other major browsers at the time of writing (caniuse).

var str = 'Text:"How secure is my information?"',
    reg = /(?<=Text:")[^"]+(?=")/;

str.match(reg)[0];
// -> How secure is my information?

Older browsers do not support lookbehind in JavaScript regular expression. You have to use capturing parenthesis for expressions like this one instead:

var str = 'Text:"How secure is my information?"',
    reg = /Text:"([^"]+)"/;

str.match(reg)[1];
// -> How secure is my information?

This will not cover all the lookbehind assertion use cases, however.


I just want to add something: JavaScript doesn't support lookbehinds like (?<= ) or (?<! ).

But it does support lookaheads like (?= ) or (?! ).


You can just do:

/Text:"(.*?)"/

Explanation:

  • Text:" : To be matched literally
  • .*? : To match anything in non-greedy way
  • () : To capture the match
  • " : To match a literal "
  • / / : delimiters


string.match(/Text:"([^"]*)"/g)


<script type="text/javascript">
var str = 'Text:"How secure is my information?"';
var obj = eval('({'+str+'})')
console.log(obj.Text);
</script>


If you want to avoid the regular expression all together you can do:

var texts = file.split('Text:"').slice(1).map(function (text) {
  return text.slice(0, text.lastIndexOf('"')); 
});


Here is an example showing how you can approach this.

1) Given this input string:

const inputText = 
`Text:"How secure is my information?"someRandomTextHere
Voice:"Not very much"
Text:"How to improve this?"
Voice:"Don't use '123456' for your password"
Text:"OK just like in the "Hackers" movie."`;

2) Extract data in double quotes after the literal Text: so that the results is an array with all matches like so:

["How secure is my information?",
 "How to improve this?",
 "OK just like in the \"Hackers\" movie."]

SOLUTION

function getText(text) {
  return text
    .match(/Text:".*"/g)
    .map(item => item.match(/^Text:"(.*)"/)[1]);
}

console.log(JSON.stringify(    getText(inputText)    ));

RUN SNIPPET TO SEE A WORKING DEMO

const inputText = 
`Text:"How secure is my information?"someRandomTextHere
Voice:"Not very much"
Text:"How to improve this?"
Voice:"Don't use '123456' for your password"
Text:"OK just like in the "Hackers" movie."`;



function getText(text) {
  return text
    .match(/Text:".*"/g)
    .map(item => item.match(/^Text:"(.*)"/)[1]);
}

console.log(JSON.stringify(    getText(inputText)    ));


If you, like me, get here while researching a bug related to the Cloudinary gem, you may find this useful:

Cloudinary recently released version 1.16.0 of their gem. In Safari, this crashes with the error 'Invalid regular expression: invalid group specifier name'.

A bug report has been filed. In the meantime I reverted to 1.15.0 and the error went away.

Hope this saves someone some lifetime.


A regular expression with lookbehind

regex = /(?<=.*?:).*/g

can be used to produce an array with all matches found in the inputText (from Piotr Berebecki's answer):

> inputText.match(regex)
[
  '"How secure is my information?"someRandomTextHere',
  '"Not very much"',
  '"How to improve this?"',
  `"Don't use '123456' for your password"`,
  '"OK just like in the "Hackers" movie."'
]

Each match consists of the quoted string following the first colon in a line.

In the absence of lookbehinds, a regular expression with groups can be used:

regex = /(.*?:)(.*)/g

With this, each match consists of a complete line, with two groups: the first containing the part up to the colon and the second containing the rest.

> inputText.match(regex)
[
  'Text:"How secure is my information?"someRandomTextHere',
  'Voice:"Not very much"',
  'Text:"How to improve this?"',
  `Voice:"Don't use '123456' for your password"`,
  'Text:"OK just like in the "Hackers" movie."'
]

To see the groups, you must use the .exec method. The first match looks so:

> [...regex.exec(inputText)]
[
  'Text:"How secure is my information?"someRandomTextHere',
  'Text:',
  '"How secure is my information?"someRandomTextHere'
]

To loop over all matches and process only the second group of each (that is, the part after the colon from each line), use something like:

> for (var m, regex = /(.*?:)(.*)/g; m = regex.exec(inputText); ) console.log(m[2]);
"How secure is my information?"someRandomTextHere
"Not very much"
"How to improve this?"
"Don't use '123456' for your password"
"OK just like in the "Hackers" movie."
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜