How to determine if text string appears as a child of a named html tag
In the doReplace function below, how would I determine if the instance of $keyword is not a child of any of an array of named html tags (h1, h2, h3, h4, h5, h6, b, u, i, etc) from the replacement point where the keyword appears in the content? I don't care to check for nested tags at this point.
I'm thinking that some recursion would be involved, inside the deReplace function.
function doReplace($keyword)
{
//if(!is_keyword_in_named_tag())
return ' <b>'.trim($keyword).'</b>';
}
function init()
{
$content = "This will be some xhtml formatted
content that will be resident on the page in memory";
$theContent =
preg_replace_callback("/\b('my test string')\b/i","doReplace", $content);
return $theContent;
}
So if the $content variable contains...
<h1>This is my test string</h1>
开发者_StackOverflow中文版Then the string "my test string" would not be replaced.
But if the #content variable contains...
<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>
Then the replaced content would be...
<h1>This is my test string</h1>
<div>This is <b>my test string</b> too <b>my test string 3</b></div>
Try this with DOMDocument and DOMXPath:
<?php
function doReplace($html)
{
$dom = new DOMDocument();
// loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//text()[
not(ancestor::h1) and
not(ancestor::h2) and
not(ancestor::h3) and
not(ancestor::h4) and
not(ancestor::h5) and
not(ancestor::h6) and
not(ancestor::b) and
not(ancestor::u) and
not(ancestor::i)
]') as $node)
{
$replaced = str_ireplace('my test string', '<b>my test string</b>', $node->wholeText);
$newNode = $dom->createDocumentFragment();
$newNode->appendXML($replaced);
$node->parentNode->replaceChild($newNode, $node);
}
// get only the body tag with its contents, then trim the body tag itself to get only the original content
echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
}
$html = '<h1>This is my test string</h1>
<h2><span>Nested my test string</span></h2>
<div>This is my test string too <b>my test string 3</b></div>';
echo doReplace($html);
You can use something like PHP Simple HTML DOM Parser.
Update: DOMDocument is a better solution (not just way faster, but works well with nested tags also), so use that instead of this one.
Example:
require_once('simple_html_dom.php');
$html = str_get_html('<h1>This is my test string</h1>
<div>This is my test string too <b>my test string 3</b></div>');
foreach ($html->find('text') as $element)
{
if (!in_array($element->parent()->tag, array('h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'b', 'u', 'i')))
$element->innertext = str_replace('my test string', '<b>my test string</b>', $element->innertext);
}
echo $html;
精彩评论