开发者

/regexp?/ on HTML, but not in form [duplicate]

This question already has answers here: 开发者_Python百科 Closed 12 years ago.

Possible Duplicate:

RegEx match open tags except XHTML self-contained tags

I need to do some regex replacement on HTML input, but I need to exclude some parts from filtering by other regexp.

(e.g. remove all <a> tags with specific href="example.com…, except the ones that are inside the <form> tag)

Is there any smart regex technique for this? Or do I have to find all forms using $regex1, then split the input to the smaller chunks, excluding the matched text blocks, and then run the $regex2 on all the chunks?


The NON-regexp way:

<?php
$html = '<html><body><a href="foo">a <b>bold</b> foz </a> b c <form><a href="foo">l</a></form> <a href="boz">a</a></body></html>';
$d = new DOMDocument();
$d->loadHTML($html);
$x = new DOMXPath($d);
$elements = $x->query('//a[not(ancestor::form) and @href="foo"]');
foreach($elements as $elm){
        //run if contents of <a> should be visible:
        while($elm->firstChild){
                $elm->parentNode->insertBefore($elm->firstChild,$elm);
        }
        //remove a
        $elm->parentNode->removeChild($elm);
}
var_dump($d->saveXML());
?>


Why can't you just dump the html string you need into a DOM helper, then use getElementsByTagName('a') to grab all anchors and use getAttribute to get the href, removeChild to remove it?


This looks like PHP, right? http://htmlpurifier.org/


Any particular reason you would want to do that with Regular Expressions? It sounds like it would be fairly straightforward in Javascript to spin through the DOM and to it that way.

In jQuery, for instance, it seems like you could do this in just a couple lines using its DOM selectors.


  • If forms can be nested, it is technically impossible.
  • If forms can not be nested, it is practically impossible. There is no function where you can use the same regex to
    1. define an area where the matching should be done (i.e. outside form)
    2. define things to be matched (i.e. elements)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜