regexp to remove entire paragraph based on it's content?
hey guys, I'm a regexp noob, Is it possible with preg_replace to re开发者_JS百科move a the an entire paragraph tag?
<p><div class="vidwrapper"> lot of content with oder divs etc. </div><p>
The paragraph should only be removed if it is following div has a class of .vidwrapper.
Is that even possible? Any idea how this regexp would look like? Thank you for your help.
If it's a fixed occurrence, then following might work:
preg_replace('#<p>[^<]*<div[^>]+class="vidwrapper"[^>]*>.*?</p>#is', "")
For matching nested html you would normally need a recursing regex, hencewhy something like phpQuery or QueryPath is then often simpler:
$html = pq($html)->find("p div.vidwrapper")->parent()->remove()->html();
It's a bad idea to do this using a regex, unless you know that there will be no paragraph (or anything that might superficially be interpreted as a paragraph) inside of the vidwrapper.
If you don't, writing a regex for something like this will be very hard:
<p><div class="vidwrapper"> Hello there. <p>Wee.</p> Yoink. </div></p>
<p><div class="vidwrapper"> Hello there. <!-- <p>Wee.</p> --> Yoink. </div></p>
An easier (and more robust) way would probably be to parse the HTML with an HTML parser, and do a search on the DOM tree instead.
See also:
- Robust and Mature HTML Parser for PHP
- RegEx match open tags except XHTML self-contained tags
If you think the script will cause problems, you can use this as well.
#
\s*
<p\s*> \s* <div \s+ class \s* = \s* (["']) vidwrapper \1 \s* >
(?:
<script (?:\s+ (?:".*?"|'.*?'|[^>]*?)+)? \s*>
.*?
</script\s*>)
| .
)*?
</p\s*>
#xs
精彩评论