Why my php substr() shows obscure characters when cutting a text?
I'm using the substr()
function to lim开发者_StackOverflow中文版it the characters in strings. but sometimes, the output text contains some obscure characters and Question marks etc...
the text which is "substred" is already UTF8 encoded, and NOT in html entities to make like this problem.
Thanks
Because you are cutting your characters into half.
Use mb_substr
for multibyte character encodings like UTF-8. substr
just counts bytes while mb_substr
counts characters.
The reason is that you use UTF-8, it's multibyte encoding,and substr() works with singlebyte only! htmlentities() doesn't matter.
You SHOULD use mb_substr() http://php.net/manual/en/function.mb-substr.php and other multibyte functions
Just to extend the Gurmbo is answer. Using mb_substr will solve your problem but still if special characters comes at the end when you trip, it still shows the some special characters. So when I did some research, wordpress having method wp_html_excerpt to solve this problem.
wp_html_excerpt method removes those special characters from the end of line.
Here is the source code from wordpress.
/**
* Safely extracts not more than the first $count characters from html string.
*
* UTF-8, tags and entities safe prefix extraction. Entities inside will *NOT*
* be counted as one character. For example & will be counted as 4, < as
* 3, etc.
*
* @since 2.5.0
*
* @param string $str String to get the excerpt from.
* @param int $count Maximum number of characters to take.
* @param string $more Optional. What to append if $str needs to be trimmed. Defaults to empty string.
* @return string The excerpt.
*/
function wp_html_excerpt( $str, $count, $more = null ) {
if ( null === $more )
$more = '';
$str = wp_strip_all_tags( $str, true );
$excerpt = mb_substr( $str, 0, $count );
// remove part of an entity at the end
$excerpt = preg_replace( '/&[^;\s]{0,6}$/', '', $excerpt );
if ( $str != $excerpt )
$excerpt = trim( $excerpt ) . $more;
return $excerpt;
}
If you have encoding problems you can also apply the html_entity_decode() function that convert all HTML entities to their applicable characters. For example:
echo substr(html_entity_decode($string_to_cut), 0, 28) . "...";
That also should work.
精彩评论