开发者

Referencing nested groups in JavaScript using string replace using regex

Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this:

var scriptTagFormat = /<scri开发者_StackOverflow中文版pt .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;

html = html.replace(
    scriptTagFormat, 
    '<span class="script-placeholder" style="display:none;" title="$2">$3</span>');

The script tags get replaced with the spans, but the resulting title attribute is blank. Shouldn't $2 match the content of the src attribute of a script tag?


Nesting of groups is irrelevant; their numbering is determined strictly by the positions of their opening parentheses within the regex. In your case, that means it's group #1 that captures the whole src="value" sequence, and group #2 that captures just the value part.


Try this:

/<script (?:(?!src).)*(?:src="(.*?)")?.*?>(.*?)<\/script>/ig

See here: rubular

As stema wrote, the .*? matches too much. With the negative lookahead (?:(?!src).)* you will match only until a src attribute.

But actually in this case you could also just move the .*? into the optional part:

/<script (?:.*?src="(.*?)")?.*?>(.*?)<\/script>/ig

See here: rubular


The .*? matches too much because the following group is optional, ==> your src is matched from one of the .*? around. if you remove the ? after your first group it works.

Update: As @morja pointed out your solution is to move the first .*? into the optional src part.

Just for completeness: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\/script>/ig

You can see it here on rubular (corrected my link also)

If you don't want to use the content of the first capturing group, then make it a non capturing group using (?:)

/<script (?:.*?(?:src="(.*?)"))?.*?>(.*?)<\/script>/ig

Then your wanted result is in $1 and $2.


Could you post the html you are retrieving? Your code works fine in a simple example: jsfiddle (warning: alert box)

My first guess is that one of your script tags does not have a src meaning you are left with a single capture group (the script contents).


I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem:

var scriptTagFormat = /<script\s+((.*?)="(.*?)")*\s*>(.*?)<\/script>/ig;

html = html.replace(
    scriptTagFormat, 
    '<span class="script-placeholder" style="display:none;" $1>$4</span>');

Before, I wanted to avoid setting non-standard attributes on the replacement span. This code blindly copies all attributes instead. Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜