How can I replace strings NOT within a link tag?

2022-12-18 07:52 问答作者：

I am working on this PHP function. The idea is to wrap certain words occuring in a string into certain tags (both, words and tags, given in an array). It works OK!, but when those words occur into a linked text or its 'src' attribute, then of course the link is broken and stuffed with tags, or tags that should not be inside a link are generated. This is what I have now:

开发者_如何学C

function replace() {
  $terminos = array (
  "beneficios" => "h3",
  "valoracion" => "h2",
  "empresarios" => "h2",
  "tecnologias" => "h2",
  "...and so on..." => "etc",
  );

  foreach ($terminos as $key => $value)
  {
  $body = "string where the word empresarios should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute.";
  $tagged = "<".$value.">".$key."</".$value.">";
  $result = str_replace($key, $tagged, $body);
  }
}

The function, in this example, should return "string where the word <h2>empresarios</h2> should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute."

I'd like this replacement function to work all throught the string, but not inside tags nor in its attributes!

(I'd like to do what is mentioned in the following thread, it's just that it's not in javascript what I need, but in PHP: /questions/1666790/how-to-replace-text-not-within-a-specific-tag-in-javascript)

Use the DOM and only modify text nodes:

$s = "foo <a href='http://test.com'>foo</a> lorem bar ipsum foo. <a>bar</a> not a test";
echo htmlentities($s) . '<hr>';

$d = new DOMDocument;
$d->loadHTML($s);

$x = new DOMXPath($d);
$t = $x->evaluate("//text()");

$wrap = array(
    'foo' => 'h1',
    'bar' => 'h2'
);

$preg_find = '/\b(' . implode('|', array_keys($wrap)) . ')\b/';

foreach($t as $textNode) {
    if( $textNode->parentNode->tagName == "a" ) {
        continue;
    }

    $sections = preg_split( $preg_find, $textNode->nodeValue, null, PREG_SPLIT_DELIM_CAPTURE);

    $parentNode = $textNode->parentNode;

    foreach($sections as $section) {  
        if( !isset($wrap[$section]) ) {
            $parentNode->insertBefore( $d->createTextNode($section), $textNode );
            continue;
        }

        $tagName = $wrap[$section];
        $parentNode->insertBefore( $d->createElement( $tagName, $section ), $textNode );
    }

    $parentNode->removeChild( $textNode );
}

echo htmlentities($d->saveHTML());

Edited to replace DOMText with DOMText and DOMElement as necessary.

To the answer you pointed, in JS, it's basically the same. You just have to specify it's a string.

$regexp = "/(<pre>(?:[^<](?!\/pre))*<\/pre>)|(\:\-\))/gi";

Also note that you may be need another preg_replace function to replace the word 'empresarios' in case it's capitalized (Empresarios) or like weird stuff (EmPreSAriOS).

Also take care of your HTML. <h2> are block elements and may be interpretated this way:

string where the word empresarios should be replaced;

And replaced

string where the word

empresarios

should be replaced;

Maybe what you'll need to use is a <big> tag.

Definitely use a dom parser to isolate the qualifying text nodes before attempting to replace with a regex pattern that respects: word boundries, case-insensitivity, and unicode characters. If you are planning to specifically target words with unicode characters, then you will need to add mb_ to some of the string functions.

After leveraging the following insights, I tailored a solution for your scenario.

https://stackoverflow.com/a/64077957/2943403
https://stackoverflow.com/a/20675396/2943403

Code: (Demo)

$html = <<<HTML
foo <a href='http://test.com'>fóo</a> lórem
bár ipsum bar food foo bark. <a>bar</a> not á test
HTML;

$lookup = [
    'foo' => 'h3',
    'bar' => 'h2'
];

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

$regexNeedles = [];
foreach ($lookup as $word => $tagName) {
    $regexNeedles[] = preg_quote($word, '~');
}
$pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~iu' ;

foreach($xpath->query('//*[not(self::a)]/text()') as $textNode) {
    $newNodes = [];
    $hasReplacement = false;
    foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
        $fragmentLower = strtolower($fragment);
        if (isset($lookup[$fragmentLower])) {
            $hasReplacement = true;
            $a = $dom->createElement($lookup[$fragmentLower]);
            $a->nodeValue = $fragment;
            $newNodes[] = $a;
        } else {
            $newNodes[] = $dom->createTextNode($fragment);
        }
    }
    if ($hasReplacement) {
        $newFragment = $dom->createDocumentFragment();
        foreach ($newNodes as $newNode) {
            $newFragment->appendChild($newNode);
        }
        $textNode->parentNode->replaceChild($newFragment, $textNode);
    }
}
echo substr(trim(utf8_decode($dom->saveHTML($dom->documentElement))), 3, -4);

Output:

<h3>foo</h3> <a href="http://test.com">fóo</a> lórem
bár ipsum <h2>bar</h2> food <h3>foo</h3> bark. <a>bar</a> not á test

继续阅读：php string

How can I replace strings NOT within a link tag?

empresarios

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？