� appears using character_limiter() with strip_tags() and utf-8 charset
I'm getting � characters when I combine Codeigniter's character_limiter()
with PHP's native strip_tags()
. Here is the code I'm using:
<?php echo character_limiter(strip_tags($block->body), 60); ?>
$block->body
is an HTML string stored in the database. I do not get this unexpected output if I use only one of the functions. It looks like this:
This is what the HTML looks like:
I didn't paste the actual HTML because the string would be modified by posting it here, see below
Here is the Codeigniter function character_limiter
:
function character_limiter($str, $n = 500, $end_char = '…')
{
if (strlen($str) < $n)
{
return $str;
}
$str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (strlen($out) >= $n)
{
$out = trim($out);
return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
}
}
}
I figured out that there was some invisible character or something that may have been causing this, because when I pasted the HTML into a text editor, then back into the "HTML source editor" in the image (which is just TinyMCE), then saved it, the weird characters disappeared.
I am using the utf-8 character set across the board (everywhere possible). The original data did come from a dump of an unknown database, and was imported with an SQL client. However, when I saved the existing string (in the CMS), nothing changed.
I can't connect the dots between these two functions causing this output when used together, and I do not get the � characters normally. I only see this output when I use:
character_limiter(strip_tags($html))
What could be causing this, and how can I prevent it?
Note: I definitely want to use the character_limiter
function, or a variation of it. It makes an ellipsis at the end of the string if its length is longer than the second param. Using it alone (without strip_tags
) works perfectly fine (no weird characters).
Update: For anyone that can't reproduce this, I put an SQL file online that demos the issue. I am importing this with 开发者_JAVA技巧MySQL Query Browser. I only get this output it seems when the HTML comes from the database. Here is the link (ignore the content, it's the client's fault): http://wesleymurch.com/test/test1.sql
� replacement character used to replace an unknown or unprintable character in php usually we solve this issue using multibyte string functions . use mb_substr with strip tags like :
mb_substr( strip_tags($text) , 0,300 ,'UTF-8' );//or what ever your charset
or you maybe modify the codeigniter function and use Multibyte String Functions .
UPDATE
function character_limiter($str, $n = 500, $end_char = '…')
{
if (mb_strlen($str) < $n)
{
return $str;
}
$str = mb_ereg_replace("\s+", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (mb_strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (mb_strlen($out) >= $n)
{
$out = trim($out);
return (mb_strlen($out) == mb_strlen($str)) ? $out : $out.$end_char;
}
}
}
精彩评论