How can I replace strings NOT within a link tag?
I am working on this PHP function. The idea is to wrap certain words occuring in a string into certain tags (both, words and tags, given in an array). It works OK!, but when those words occur into a linked text or its 'src' attribute, then of course the link is broken and stuffed with tags, or tags that should not be inside a link are generated. This is what I have now:
开发者_如何学Cfunction replace() {
$terminos = array (
"beneficios" => "h3",
"valoracion" => "h2",
"empresarios" => "h2",
"tecnologias" => "h2",
"...and so on..." => "etc",
);
foreach ($terminos as $key => $value)
{
$body = "string where the word empresarios should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute.";
$tagged = "<".$value.">".$key."</".$value.">";
$result = str_replace($key, $tagged, $body);
}
}
The function, in this example, should return "string where the word <h2>empresarios</h2> should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute."
I'd like this replacement function to work all throught the string, but not inside tags nor in its attributes!
(I'd like to do what is mentioned in the following thread, it's just that it's not in javascript what I need, but in PHP: /questions/1666790/how-to-replace-text-not-within-a-specific-tag-in-javascript
)
Use the DOM and only modify text nodes:
$s = "foo <a href='http://test.com'>foo</a> lorem bar ipsum foo. <a>bar</a> not a test";
echo htmlentities($s) . '<hr>';
$d = new DOMDocument;
$d->loadHTML($s);
$x = new DOMXPath($d);
$t = $x->evaluate("//text()");
$wrap = array(
'foo' => 'h1',
'bar' => 'h2'
);
$preg_find = '/\b(' . implode('|', array_keys($wrap)) . ')\b/';
foreach($t as $textNode) {
if( $textNode->parentNode->tagName == "a" ) {
continue;
}
$sections = preg_split( $preg_find, $textNode->nodeValue, null, PREG_SPLIT_DELIM_CAPTURE);
$parentNode = $textNode->parentNode;
foreach($sections as $section) {
if( !isset($wrap[$section]) ) {
$parentNode->insertBefore( $d->createTextNode($section), $textNode );
continue;
}
$tagName = $wrap[$section];
$parentNode->insertBefore( $d->createElement( $tagName, $section ), $textNode );
}
$parentNode->removeChild( $textNode );
}
echo htmlentities($d->saveHTML());
Edited to replace DOMText with DOMText and DOMElement as necessary.
To the answer you pointed, in JS, it's basically the same. You just have to specify it's a string.
$regexp = "/(<pre>(?:[^<](?!\/pre))*<\/pre>)|(\:\-\))/gi";
Also note that you may be need another preg_replace function to replace the word 'empresarios' in case it's capitalized (Empresarios) or like weird stuff (EmPreSAriOS).
Also take care of your HTML. <h2>
are block elements and may be interpretated this way:
string where the word empresarios should be replaced;
And replaced
string where the word
empresarios
should be replaced;
Maybe what you'll need to use is a <big>
tag.
Definitely use a dom parser to isolate the qualifying text nodes before attempting to replace with a regex pattern that respects: word boundries, case-insensitivity, and unicode characters. If you are planning to specifically target words with unicode characters, then you will need to add mb_
to some of the string functions.
After leveraging the following insights, I tailored a solution for your scenario.
- https://stackoverflow.com/a/64077957/2943403
- https://stackoverflow.com/a/20675396/2943403
Code: (Demo)
$html = <<<HTML
foo <a href='http://test.com'>fóo</a> lórem
bár ipsum bar food foo bark. <a>bar</a> not á test
HTML;
$lookup = [
'foo' => 'h3',
'bar' => 'h2'
];
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$regexNeedles = [];
foreach ($lookup as $word => $tagName) {
$regexNeedles[] = preg_quote($word, '~');
}
$pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~iu' ;
foreach($xpath->query('//*[not(self::a)]/text()') as $textNode) {
$newNodes = [];
$hasReplacement = false;
foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
$fragmentLower = strtolower($fragment);
if (isset($lookup[$fragmentLower])) {
$hasReplacement = true;
$a = $dom->createElement($lookup[$fragmentLower]);
$a->nodeValue = $fragment;
$newNodes[] = $a;
} else {
$newNodes[] = $dom->createTextNode($fragment);
}
}
if ($hasReplacement) {
$newFragment = $dom->createDocumentFragment();
foreach ($newNodes as $newNode) {
$newFragment->appendChild($newNode);
}
$textNode->parentNode->replaceChild($newFragment, $textNode);
}
}
echo substr(trim(utf8_decode($dom->saveHTML($dom->documentElement))), 3, -4);
Output:
<h3>foo</h3> <a href="http://test.com">fóo</a> lórem
bár ipsum <h2>bar</h2> food <h3>foo</h3> bark. <a>bar</a> not á test
精彩评论