Regular expression to replace HTML content
I a开发者_Python百科m trying to replace HTML content with regular expression.
from
<A HREF="ZZZ">test test ZZZ<SPAN>ZZZ test test</SPAN></A>
to
<A HREF="ZZZ">test test AAA<SPAN>AAA test test</SPAN></A>
note that only words outside HTML tags are replaced from ZZZ to AAA.
Any idea? Thanks a lot in advance.
You could walk all nodes, replacing text in text ones (.nodeType == 3):
Something like:
element.find('*:contains(ZZZ)').contents().each(function () {
if (this.nodeType === 3)
this.nodeValue = this.nodeValue.replace(/ZZZ/g,'AAA')
})
Or same without jQuery:
function replaceText(element, from, to) {
for (var child = element.firstChild; child !== null; child = child.nextSibling) {
if (child.nodeType === 3)
this.nodeValue = this.nodeValue.replace(from,to)
else if (child.nodeType === 1)
replaceText(child, from, to);
}
}
replaceText(element, /ZZZ/g, 'AAA');
The best idea in this case is most certainly to not use regular expressions to do this. At least not on their own. JavaScript surely has a HTML Parser somewhere?
If you really must use regular expressions, you could try to look for every instance of ZZZ that is followed by a "<" before any ">". That would look like
ZZZ(?=[^>]*<)
This might break horribly if the code contains HTML comments or script blocks, or is not well formed.
Assuming a well-formed html document with outer/enclosing tags like <html>
, I would think the easiest way would be to look for the >
and <
signs:
/(\>[^\>\<]*)ZZZ([^\>\<]*\<)/$1AAA$2/
If you're dealing with HTML fragments that may not have enclosing tags, it gets a little more complicated, you'd have to allow for start of string and end of string
Example JS (sorry, missed the tag):
alert('<A HREF="ZZZ">test test ZZZ<SPAN>ZZZ test test</SPAN></A>'.replace(/(\>[^\>\<]*)ZZZ([^\>\<]*\<)/g, "$1AAA$2"));
Explanation: for each match that
- starts with
>
:\>
- follows with any number of characters that are neither
>
nor<
:[^\>\<]*
- then has "ZZZ"
- follows with any number of characters that are neither
>
nor<
:[^\>\<]*
- and ends with
<
:\<
Replace with
- everything before the ZZZ, marked with the first capture group (parentheses):
$1
- AAA
- everything after the ZZZ, marked with the second capture group (parentheses):
$2
Using the "g" (global) option to ensure that all possible matches are replaced.
Try this:
var str = '<DIV>ZZZ test test</DIV><A HREF="ZZZ">test test ZZZ</A>';
var rpl = str.match(/href=\"(\w*)\"/i)[1];
console.log(str.replace(new RegExp(rpl + "(?=[^>]*<)", "gi"), "XXX"));
have you tried this:
replace:
>([^<>]*)(ZZZ)([^<>]*)<
with:
>$1AAA$3<
but beware all the savvy suggestions in the post linked in the first comment to your question!
精彩评论