开发者

How to clip HTML fragments without breaking up tags?

Say I have a 200 character string that contains HTML markup. I want to show a preview of just the first 50 chars. without 'splitting开发者_如何学编程 up' the tags. In other words, the fragment should not contain a <b> without a </b>. Any server side processing should be in PHP.


You should check out Tidy HTML. Just cut it after the first 50 non-HTML characters, then run it through Tidy to fix the HTML.


Here's a fast and reliable solution using DOMDocument which is part of standard PHP:

function cut_html ($html, $limit) {
    $dom = new DOMDocument();
    $dom->loadHTML(mb_convert_encoding("<div>{$html}</div>", "HTML-ENTITIES", "UTF-8"), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    cut_html_recursive($dom->documentElement, $limit);
    return substr($dom->saveHTML($dom->documentElement), 5, -6);
}

function cut_html_recursive ($element, $limit) {
    if($limit > 0) {
        if($element->nodeType == 3) {
            $limit -= strlen($element->nodeValue);
            if($limit < 0) {
                $element->nodeValue = substr($element->nodeValue, 0, strlen($element->nodeValue) + $limit);
            }
        }
        else {
            for($i = 0; $i < $element->childNodes->length; $i++) {
                if($limit > 0) {
                    $limit = cut_html_recursive($element->childNodes->item($i), $limit);
                }
                else {
                    $element->removeChild($element->childNodes->item($i));
                    $i--;
                }
            }
        }
    }
    return $limit;
}


A simple approach might be to strip_tags() first and then capture the excerpt.


Short answer: convert it to DOM with DOMDocument::loadHTML($string) then walk the tree counting the characters in the text nodes. When you hit your limit, replace the rest of that node with '...' or the empty string, and simply call $node->parentNode->removeChild($node) on all subsequent nodes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜