开发者

Regular expression to replace HTML content

I a开发者_Python百科m trying to replace HTML content with regular expression.

from

<A HREF="ZZZ">test test ZZZ<SPAN>ZZZ test test</SPAN></A>

to

<A HREF="ZZZ">test test AAA<SPAN>AAA test test</SPAN></A>

note that only words outside HTML tags are replaced from ZZZ to AAA.

Any idea? Thanks a lot in advance.


You could walk all nodes, replacing text in text ones (.nodeType == 3):

Something like:

element.find('*:contains(ZZZ)').contents().each(function () {
    if (this.nodeType === 3)
        this.nodeValue = this.nodeValue.replace(/ZZZ/g,'AAA')
})

Or same without jQuery:

function replaceText(element, from, to) {
    for (var child = element.firstChild; child !== null; child = child.nextSibling) {
        if (child.nodeType === 3)
            this.nodeValue = this.nodeValue.replace(from,to)
        else if (child.nodeType === 1)
            replaceText(child, from, to);
    }
}

replaceText(element, /ZZZ/g, 'AAA');


The best idea in this case is most certainly to not use regular expressions to do this. At least not on their own. JavaScript surely has a HTML Parser somewhere?

If you really must use regular expressions, you could try to look for every instance of ZZZ that is followed by a "<" before any ">". That would look like

ZZZ(?=[^>]*<)

This might break horribly if the code contains HTML comments or script blocks, or is not well formed.


Assuming a well-formed html document with outer/enclosing tags like <html>, I would think the easiest way would be to look for the > and < signs:

/(\>[^\>\<]*)ZZZ([^\>\<]*\<)/$1AAA$2/

If you're dealing with HTML fragments that may not have enclosing tags, it gets a little more complicated, you'd have to allow for start of string and end of string

Example JS (sorry, missed the tag):

alert('<A HREF="ZZZ">test test ZZZ<SPAN>ZZZ test test</SPAN></A>'.replace(/(\>[^\>\<]*)ZZZ([^\>\<]*\<)/g, "$1AAA$2"));

Explanation: for each match that

  • starts with >: \>
  • follows with any number of characters that are neither > nor <: [^\>\<]*
  • then has "ZZZ"
  • follows with any number of characters that are neither > nor <: [^\>\<]*
  • and ends with <: \<

Replace with

  • everything before the ZZZ, marked with the first capture group (parentheses): $1
  • AAA
  • everything after the ZZZ, marked with the second capture group (parentheses): $2

Using the "g" (global) option to ensure that all possible matches are replaced.


Try this:

var str = '<DIV>ZZZ test test</DIV><A HREF="ZZZ">test test ZZZ</A>';
var rpl = str.match(/href=\"(\w*)\"/i)[1];
console.log(str.replace(new RegExp(rpl + "(?=[^>]*<)", "gi"), "XXX"));


have you tried this:

replace:

>([^<>]*)(ZZZ)([^<>]*)<

with:

>$1AAA$3<

but beware all the savvy suggestions in the post linked in the first comment to your question!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜