Split html text without breaking "open" tags
I'm using a PHP function to split text into blocks of max N chars. Once each block is "treated" somehow, it is concatenated back again. The problem is that the text can be HTML... and if the split occurs between open html tags, the "treatment" gets spoiled. Can someone give a hint about breaking text only between closed tags?
Requirements:
- Max block length: N
- There are NO
<body>
tags - There are NO
<HTML>
tags - There are NO
<head>
tags
Adding a sample: (max block length = 173)
<div class="myclass">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer dapibus sagittis lacus quis cursus.
</div>
<div class="anotherclass">
Nulla ligula felis, adipiscing ac varius et, sollicitudin eu lorem. Sed laoreet porttitor est, sit amet vestibulum massa pretium et. In interdum auctor nulla, ac elementum ligula aliquam eget
</div>
In the text above, given 173 chars as the limit, text would break @ "adipiscing",开发者_运维技巧 however that would break the <div class="anotherclass">
. In this case, the split shall occur at the first closing, although being shorter the the max limit.
The "correct" way would be to parse the HTML and perform the shortening operations on its text nodes. In PHP5 you could use the DOM extension, and specifically DOMDocument::loadHTML()
.
Hmmm I've used a code where I had to split the copy entered by a WYSIWYG and wanted to retrieve the first paragraph from it. Its little dodgy but did the trick for me. If you wanted to add in show "n" then you could add that to the "intro" var using substr. Hope this starts you off :-|
function break_html_description_to_chunks($description = null)
{
$firstParaEnd = strpos($description,"</p>");
$firstParaEnd += 4;
$intro = substr($description, 0, $firstParaEnd);
$body = substr($description, $firstParaEnd, strlen($description));
$temp = array("intro" => $intro, "body" => $body);
return $temp;
}
精彩评论