/regexp?/ on HTML, but not in form [duplicate]
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I need to do some regex replacement on HTML input, but I need to exclude some parts from filtering by other regexp.
(e.g. remove all <a>
tags with specific href="example.com…
, except the ones that are inside the <form>
tag)
Is there any smart regex technique for this? Or do I have to find all forms using $regex1
, then split the input to the smaller chunks, excluding the matched text blocks, and then run the $regex2
on all the chunks?
The NON-regexp way:
<?php
$html = '<html><body><a href="foo">a <b>bold</b> foz </a> b c <form><a href="foo">l</a></form> <a href="boz">a</a></body></html>';
$d = new DOMDocument();
$d->loadHTML($html);
$x = new DOMXPath($d);
$elements = $x->query('//a[not(ancestor::form) and @href="foo"]');
foreach($elements as $elm){
//run if contents of <a> should be visible:
while($elm->firstChild){
$elm->parentNode->insertBefore($elm->firstChild,$elm);
}
//remove a
$elm->parentNode->removeChild($elm);
}
var_dump($d->saveXML());
?>
Why can't you just dump the html string you need into a DOM helper, then use getElementsByTagName('a')
to grab all anchors and use getAttribute
to get the href, removeChild
to remove it?
This looks like PHP, right? http://htmlpurifier.org/
Any particular reason you would want to do that with Regular Expressions? It sounds like it would be fairly straightforward in Javascript to spin through the DOM and to it that way.
In jQuery, for instance, it seems like you could do this in just a couple lines using its DOM selectors.
- If forms can be nested, it is technically impossible.
- If forms can not be nested, it is practically impossible. There is no function where you can use the same regex to
- define an area where the matching should be done (i.e. outside form)
- define things to be matched (i.e. elements)
精彩评论