开发者

preg_replace to remove empty tags but keep the end of blockquotes

I made this expression to remove all empty (inluding tags with just whitespace) tags in the page.

$content =  preg_replace('/<[^\/>]*>([\s]?)*<\/[^>]*>/', '', $content);

It worked a treat until it had to deal with content like this...

 <blockquote>
<p >foo bar</p>
</blockquote>
<p ><a href="image.jpg" rel="lightbox" title=""><img  title="image" src="image.jpg" /></a><br /></p>

and it outputs it as...

<blockquote>
<p >this is a test for the pluggin</p>
<p ><a href="image.jpg" rel="lightbox" title=""><img  title="image" src="image.jpg" /></a><br /></p>

Thus removing the </blockquote>.

I have been scratching my head on this one and can't get it working. Can anyone see an obvious solut开发者_Python百科ion other than specifying what tags it should format? I should also say that it is formatting 'the_content' on a wordpress post.


Regexps and HTML are not a good match, since HTML is not a regular syntax, and there are no end of edge cases and gotchas. You'll be better off using an HTML parser such as this one and inspecting/manipulating the DOM object.


You might also like to take a look at HTML Purifier, which is more advanced than Simple HTML Dom, if you find it doesn't get all the tags.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜