Checking and removing empty tags with PHP
What is the fastest way to remove empty html tags from a string?
I have programmed something like this to detect empty anchor tags:
$temp = strip_tags($string, "<blockquote><a>");
$cmatch = array();
if(preg_match_all("~<a.*><\/a>~iU", $temp, $cmatch, PREG_SET_ORDER))
{
foreach($cmatch as $cm)
{
foreach($cm as $t) //echo htmlentities($t)."<br />";
$temp = trim(str_replace($t, '', $temp));
}
}
if(!empty($temp))
{
echo '<div class="c" style="margin-top:20px;">';
echo $temp;
echo '</div>';
}
//do not output if empty tags (problem with div margin)
It must be possible to do this more efficiently. Would it be faster to convert the string to html DOM and do checking ther开发者_Python百科e?
Regular expressions are not the right tool for parsing HTML.
As a non-specific answer, I highly recommend using a DOM parsing library to accomplish this. To name a few gotchas that will make regular expressions a nightmare:
- You may catch
<a></a>
tags, but will you catch<a />
tags? - Is the following
p
tag empty?:<p><a></a></p>
If so, will your code catch it? If it doesn't, how many passes will you need to run on the string before you're confident enough to have caught them all? - Will you catch tags which aren't properly closed?
- Will you catch tags which overlap?
精彩评论