Charset problems with PHP
I have a problem with a PHP code that transforms accent characters in non accent characters. I have this code working a year ago but I'm trying to get this to work but without success. The translation is no开发者_如何学Ct done correctly.
Here is the code:
<?php
echo accentdestroyer('azeméis');
/**
*
* This function transform accent characters to non accent characters
* @param text $string
*/
function accentdestroyer($string) {
$string=strtr($string,
"()!$?: ,&+-/.ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ"
,
"-------------SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy");
return $string;
}
?>
I have tested to save the document in UTF-8 but gives me something like this: "azemy�is"
Some clues on what can I do to get this working correctly?
Best Regards,
A better solution may be to transliterate those characters automatically using iconv()
.
As for the reason your function doesn't work, it may have something to do with the fact that echo strlen('Š');
outputs 2. The documentation explicitly refers to single byte characters.
Also,
$a = 'Š';
var_dump(strtr('Š', 'Š', '!')); // string(2) "!�"
So the first byte has been matched but the second one (leftover) isn't a byte pointing to a valid Unicode character.
Update
Here is a workign example using iconv()
.
$str = 'ŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚ';
$str = iconv("utf-8", "us-ascii//TRANSLIT", $str);
var_dump($str); // string(37) "OEZsoezY?uAAAAAAAECEEEEIIII?NOOOOO?UU"
Some characters didn't quite translate, such as ¥
and Ø
, but most did. You can append //IGNORE
to the output character set to silently discard the ones which don't transliterate.
You could also drop all non word characters too using a Unicode regex with \pL
.
精彩评论