How to determine if text string appears as a child of a named html tag

2023-01-24 01:35 问答作者：

In the doReplace function below, how would I determine if the instance of $keyword is not a child of any of an array of named html tags (h1, h2, h3, h4, h5, h6, b, u, i, etc) from the replacement point where the keyword appears in the content? I don't care to check for nested tags at this point.

I'm thinking that some recursion would be involved, inside the deReplace function.

function doReplace($keyword)
{
 //if(!is_keyword_in_named_tag())
    return ' <b>'.trim($keyword).'</b>';
}

function init()
{
    $content = "This will be some xhtml formatted 
    content that will be resident on the page in memory";
    $theContent = 
      preg_replace_callback("/\b('my test string')\b/i","doReplace", $content);
    return $theContent;
}

So if the $content variable contains...

<h1>This is my test string</h1>

开发者_StackOverflow中文版

Then the string "my test string" would not be replaced.

But if the #content variable contains...

<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>

Then the replaced content would be...

<h1>This is my test string</h1>
<div>This is <b>my test string</b> too <b>my test string 3</b></div>

Try this with DOMDocument and DOMXPath:

<?php

function doReplace($html)
{
    $dom = new DOMDocument();
    // loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
    $dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));

    $xpath = new DOMXPath($dom);

    foreach ($xpath->query('//text()[
        not(ancestor::h1) and
        not(ancestor::h2) and
        not(ancestor::h3) and
        not(ancestor::h4) and
        not(ancestor::h5) and
        not(ancestor::h6) and
        not(ancestor::b) and
        not(ancestor::u) and
        not(ancestor::i)
        ]') as $node)
    {
        $replaced = str_ireplace('my test string', '<b>my test string</b>', $node->wholeText);
        $newNode = $dom->createDocumentFragment();
        $newNode->appendXML($replaced);
        $node->parentNode->replaceChild($newNode, $node);
    }

    // get only the body tag with its contents, then trim the body tag itself to get only the original content
    echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
}

$html = '<h1>This is my test string</h1>
<h2><span>Nested my test string</span></h2>
<div>This is my test string too <b>my test string 3</b></div>';

echo doReplace($html);

You can use something like PHP Simple HTML DOM Parser.

Update: DOMDocument is a better solution (not just way faster, but works well with nested tags also), so use that instead of this one.

Example:

require_once('simple_html_dom.php');

$html = str_get_html('<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>');

foreach ($html->find('text') as $element)
{
    if (!in_array($element->parent()->tag, array('h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'b', 'u', 'i')))
        $element->innertext = str_replace('my test string', '<b>my test string</b>', $element->innertext);
}

echo $html;

继续阅读：php preg-replace

How to determine if text string appears as a child of a named html tag

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？