Text parser gives false negative on needle when haystack contains extra markup
The code below takes a keyword and a string of text (sanitized of html tags) and determines if the keyword appears in the last sentence of the sanitized content.
There's one glitch that I can't figure. When the end of the content contains so much as an empty space or paragraph tag with a nonbreaking space, ie
This is the last sentence.<p> </p>
I get a false negative (no match), despite the fact that (1) The keyword is definitely in the last sentence and (2) the strip_tags() function should render the appearance of the tag at the end a non issue.
Anyone see why that might be happening?
function plugin_get_kw_last_sentence($post) {
$theContent = strip_tags(strtolower($post->post_content));
$theKeyword = 'test';
$thePiecesByKeyword = plugin_get_chunk_keyword($theKeyword,$theContent);
if (count($thePiecesByKeyword)>0) {
$theCount = $thePiecesByKeyword[count($thePiecesByKeyword)-1];
$theCount = trim($theCount,'.');
if (substr_count($theCount,'.')>0) {
return FALSE;
} e开发者_如何学Clse {
return TRUE;
}
}
return FALSE;
}
function plugin_get_chunk_keyword($theKeyword, $theContent) {
if (!plugin_get_kw_in_content($theKeyword,$theContent)) {
return array();
}
$myPieceReturn = preg_split('/\b' . $theKeyword . '\b/i', $theContent);
return $myPieceReturn;
}
You've got a lot going on there that I think can be covered in the regex alone, if I understand your logic correctly. Couldn't the whole logic be reduced down to this:
function plugin_get_kw_last_sentence($post) {
$pattern = '/' . $theKeyword . '[^.!?]*[.!?][^.!?]*$/';
$subject = strip_tags(strtolower($post->post_content));
return preg_match($pattern, $subject);
}
The regex matches when it finds your keyword and the final sentence ending punctuation mark with no other sentence ending punctuation marks between them.
Now this is obviously not bullet proof as things like titles (i.e., Mr., Mrs.) and etc... and anything else including these sentence ending punctuation marks will throw you off. This should get you what you're asking for, as your given code does not account for those situations either.
精彩评论