Referencing nested groups in JavaScript using string replace using regex
Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this:
var scriptTagFormat = /<scri开发者_StackOverflow中文版pt .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" title="$2">$3</span>');
The script tags get replaced with the spans, but the resulting title
attribute is blank. Shouldn't $2
match the content of the src
attribute of a script tag?
Nesting of groups is irrelevant; their numbering is determined strictly by the positions of their opening parentheses within the regex. In your case, that means it's group #1 that captures the whole src="value"
sequence, and group #2 that captures just the value
part.
Try this:
/<script (?:(?!src).)*(?:src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
As stema wrote, the .*?
matches too much. With the negative lookahead (?:(?!src).)*
you will match only until a src
attribute.
But actually in this case you could also just move the .*?
into the optional part:
/<script (?:.*?src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
The .*?
matches too much because the following group is optional, ==> your src
is matched from one of the .*?
around. if you remove the ?
after your first group it works.
Update: As @morja pointed out your solution is to move the first .*?
into the optional src part.
Just for completeness: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\/script>/ig
You can see it here on rubular (corrected my link also)
If you don't want to use the content of the first capturing group, then make it a non capturing group using (?:)
/<script (?:.*?(?:src="(.*?)"))?.*?>(.*?)<\/script>/ig
Then your wanted result is in $1 and $2.
Could you post the html you are retrieving? Your code works fine in a simple example: jsfiddle (warning: alert box)
My first guess is that one of your script tags does not have a src meaning you are left with a single capture group (the script contents).
I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem:
var scriptTagFormat = /<script\s+((.*?)="(.*?)")*\s*>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" $1>$4</span>');
Before, I wanted to avoid setting non-standard attributes on the replacement span
. This code blindly copies all attributes instead. Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes.
精彩评论