开发者

PHP: Bolding of overlapping keywords in string

This is a problem that I have figured out how to solve, but I want to solve it in a simpler way... I'm trying to improve as a programmer.

Have done my research and have failed to find an elegant solution to the following problem:

I have a hypothetical array of keywords to search for:

$keyword_array = array('he','heather');

and a hypothetical string:

$text = "What did he say to heather?";

And, finally, a hypothetical function:

function bold_keywords($text, $ke开发者_如何学JAVAyword_array)
{
    $pattern = array();
    $replace = array();

    foreach($keyword_array as $keyword)
    {
        $pattern[] = "/($keyword)/is";
        $replace[] = "<b>$1</b>";
    }

    $text = preg_replace($pattern, $replace, $text);

    return $text;
}

The function (not too surprisingly) is returning something like this:

"What did <b>he</b> say to <b>he</b>ather?"

Because it is not recognizing "heather" when there is a bold tag in the middle of it.

What I want the final solution to do is, as simply as possible, return one of the two following strings:

"What did <b>he</b> say to <b>heather</b>?"
"What did <b>he</b> say to <b><b>he</b>ather</b>?"

Some final conditions:

--I would like the final solution to deal with a very large number of possible keywords

--I would like it to deal with the following two situations (lines represent overlapping strings):

One string engulfs the other, like the following two examples:

-- he, heather

-- sanding, and

Or one string does not engulf the other:

-- entrain, training

Possible way to solve:

-A regex that ignores tags in keywords

-Long way (that I am trying to avoid):

*Search string for all occurrences of each keyword, store an array of positions (start and end) of keywords to be bolded

*Process this array recursively to combine overlapping keywords, so there is no redundancy

*Add the bold tags (starting from the end of the string, to avoid the positions of information shifting from the additional characters)

Many thanks in advance!


Example

$keyword_array = array('he','heather');
$text = "What did he say to heather?";
$pattern = array();
$replace = array();
sort($keyword_array, SORT_NUMERIC);
foreach($keyword_array as $keyword)
{
    $pattern[] = "/ ($keyword)/is";
    $replace[] = " <b>$1</b>";
}

$text = preg_replace($pattern, $replace, $text);

echo $text; // What did <b>he</b> say to <b>heather</b>?


need to change your regex pattern to recognize that each "term" you are searching for is followed by whitespace or punctuation, so that it does not apply the pattern match to items followed by an alpha-numeric.


Simplistic and lazy-ish Approach off The Top of My head:

Sort your initial Array by Item length, descending! No more "Not recognized because there's already a Tag in The Middle" issues!

Edit: The nested tags issue is then easily fixed by extending your regex in a Way that >foo and foo< isn't being matched anymore.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜