chunk_split() corrupts multibyte characters
When I use the chunk_split()
function, it ruins my accented characters and special characters. How can I correct this problem?
Here is my PHP code.
if(count($text) > 0) {
$text = implode(' ', $text);
echo chunk_split($text, 8, '<br />');
}
Ruined accent characters.
&a
mp; Post
er ÀÁ�
�ÃÄÅ�
�áâã�
�åÒÓ�
�ÕÖØ�
�óôõ�
�øÈÉ�
�Ëéè�
�ëÇç�
�ÍÎÏ�
�íîï�
�ÚÛÜ�
�úûü�
�Ññ
chunk_split isn't multibyte safe and there isn't a native mb_chunk_split.
http://php.net/manual/en/function.chunk-split.php
Here is a function to do that from a commenter in the php docs:
<?php
//from Peter from dezzignz.com 05-Apr-2010 11:30 @ php.net
function mbStringToArray ($str) {
if (empty($str)) return false;
$len = mb_strlen($str);
$array = array();
for ($i = 0; $i < $len; $i++) {
$array[] = mb_substr($str, $i, 1);
}
return $array;
}
function mb_chunk_split($str, $len, $glue) {
if (empty($str)) return false;
$array = mbStringToArray ($str);
$n = 0;
$new = '';
foreach ($array as $char) {
if ($n < $len) $new .= $char;
elseif ($n == $len) {
$new .= $glue . $char;
$n = 0;
}
$n++;
}
return $new;
}
?>
Try converting the character set before and after as seen here:
http://us3.php.net/manual/en/function.chunk-split.php#99316
Regex offers a very succinct and direct replacement for chunk_split()
when dealing with multibyte characters.
Pattern breakdown:
~ #start pattern delimiter
.{8} #match 8 of any non-newline character
\K #forget previously matched characters
(?!$) #not followed by the end of the string
~ #end pattern delimiter
u #evaluate in multibyte mode
Replace with <br>
but I'll demonstrate with \n
.
Code: (Demo)
$str = 'áâãäąăāæåߧśšşçćčźżžýųűů';
var_export(
preg_replace(
'~.{8}\K(?!$)~u',
"\n",
$str
)
);
Output:
'áâãäąăāæ
åߧśšşçć
čźżžýųűů'
精彩评论