开发者

How to determine if text string appears as a child of a named html tag

In the doReplace function below, how would I determine if the instance of $keyword is not a child of any of an array of named html tags (h1, h2, h3, h4, h5, h6, b, u, i, etc) from the replacement point where the keyword appears in the content? I don't care to check for nested tags at this point.

I'm thinking that some recursion would be involved, inside the deReplace function.

function doReplace($keyword)
{
 //if(!is_keyword_in_named_tag())
    return ' <b>'.trim($keyword).'</b>';
}

function init()
{
    $content = "This will be some xhtml formatted 
    content that will be resident on the page in memory";
    $theContent = 
      preg_replace_callback("/\b('my test string')\b/i","doReplace", $content);
    return $theContent;
}

So if the $content variable contains...

<h1>This is my test string</h1>
开发者_StackOverflow中文版

Then the string "my test string" would not be replaced.

But if the #content variable contains...

<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>

Then the replaced content would be...

<h1>This is my test string</h1>
<div>This is <b>my test string</b> too <b>my test string 3</b></div>


Try this with DOMDocument and DOMXPath:

<?php

function doReplace($html)
{
    $dom = new DOMDocument();
    // loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
    $dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));

    $xpath = new DOMXPath($dom);

    foreach ($xpath->query('//text()[
        not(ancestor::h1) and
        not(ancestor::h2) and
        not(ancestor::h3) and
        not(ancestor::h4) and
        not(ancestor::h5) and
        not(ancestor::h6) and
        not(ancestor::b) and
        not(ancestor::u) and
        not(ancestor::i)
        ]') as $node)
    {
        $replaced = str_ireplace('my test string', '<b>my test string</b>', $node->wholeText);
        $newNode = $dom->createDocumentFragment();
        $newNode->appendXML($replaced);
        $node->parentNode->replaceChild($newNode, $node);
    }

    // get only the body tag with its contents, then trim the body tag itself to get only the original content
    echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
}

$html = '<h1>This is my test string</h1>
<h2><span>Nested my test string</span></h2>
<div>This is my test string too <b>my test string 3</b></div>';

echo doReplace($html);


You can use something like PHP Simple HTML DOM Parser.

Update: DOMDocument is a better solution (not just way faster, but works well with nested tags also), so use that instead of this one.

Example:

require_once('simple_html_dom.php');

$html = str_get_html('<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>');

foreach ($html->find('text') as $element)
{
    if (!in_array($element->parent()->tag, array('h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'b', 'u', 'i')))
        $element->innertext = str_replace('my test string', '<b>my test string</b>', $element->innertext);
}

echo $html;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜