PHP: Removing only the first few empty <p> tags
I have a custom developed CMS where users can enter some content into a rich text field (ckeditor).
Users simply copy-paste data from another document. Sometimes the data has empty <p>
tags at the beginning. Here's a sample of the data:
<p></p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
I don't want to remove all the empty <p>
tags, only the ones before the actual data, the top 3 <p>
tags in this case.
How can I do that?
Edit: To clarify, I need a PHP solution. Javascript won't do.
Is there a way I can gather all <开发者_JS百科p>
tags in an array, then iterate and delete until I encounter one with data?
Please, don't use regular expressions for irregular strings: it stirs the sleeping god. Instead, use XPath:
function strip_opening_lines($html) {
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//p");
foreach ($nodes as $node) {
// Remove non-significant whitespace.
$trimmed_value = trim($node->nodeValue);
// Check to see if the node is empty (i.e. <p></p>).
// If so, remove it from the stack.
if (empty($trimmed_value)) {
$node->parentNode->removeChild($node);
}
// If we found a non-empty node, we're done. Break out.
else {
break;
}
}
$parsed_html = $dom->saveHTML();
// DOMDocument::saveHTML adds a DOCTYPE, <html>, and <body>
// tags to the parsed HTML. Since this is regular data,
// we can use regular expressions.
preg_match('#<body>(.*?)<\/body>#is', $parsed_html, $matches);
return $matches[1];
}
Reasons why all the regex solutions presented are bad:
- Won't match empty paragraph elements with attributes (e.g.
<p class="foo"></p>
) - Won't match empty paragraph elements that are not literally empty (e.g.
<p> </p>
)
Normally I would advise against using a regular expression to parse HTML, but this one seems harmless:
$html = preg_replace('!^(<p></p>\s*)+!', '', $html);
Use
$html = preg_replace ("~^(<p><\/p>[\s\n]*)*~iUmx", "", $html);
You can do it in javascript, as soon as performs paste operation, strip off unwanted tags using regular expressions,
your code will be like,
document.getElementById("id of rich text field").onkeyup = stripData;
document.getElementById("id of rich text field").onmouseup = stripData;
function stripData(){
document.getElementById("id of rich text field").value = document.getElementById("id of rich text field").value.replace(/\<p\>\<\/p\>/g,"");
}
Edit: To remove initial empty
only,
function stripData(){
var dataStr = document.getElementById("id of rich text field").value
while(dataStr.match(/^\<p\>\<\/p\>/g)) {
dataStr = dataStr .replace(/^\<p\>\<\/p\>/g,"");
}
document.getElementById("id of rich text field").value = dataStr;
}
精彩评论