开发者

Accented and other non-ASCII characters do not display on web pages

I want some help, my website is not showing some characters like ë,-, etc etc..

i have tryed this method to get them

function UTFeer($v) {

    //reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
    $v = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.'|[\x00-\x7F][\x80-\xBF]+'. '|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*'. '|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})'. '|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S', '?', $v);

    //reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
    $v = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'. '|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $v );

    return $v;
}

The database i had is from wordpress, right now i am not using wordpress any more but a customs system to get the data from db. Please can some one help, about how to show all characters in the website .. Thank you

EDIT: Now i am using, this code and it seems that works, but is this code "heavy " for the website?

function normalize_special_characters( $str )
{
    # Quotes cleanup
    $str = ereg_replace( chr(ord("`")),开发者_StackOverflow中文版 "'", $str );        # `
    $str = ereg_replace( chr(ord("´")), "'", $str );        # ´
    $str = ereg_replace( chr(ord("„")), ",", $str );        # „
    $str = ereg_replace( chr(ord("`")), "'", $str );        # `
    $str = ereg_replace( chr(ord("´")), "'", $str );        # ´
    $str = ereg_replace( chr(ord("“")), "\"", $str );       # “
    $str = ereg_replace( chr(ord("”")), "\"", $str );       # ”
    $str = ereg_replace( chr(ord("´")), "'", $str );        # ´

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                            'Ê'=>'E', 'Ë'=>'Ë', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                            'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
                            'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'ë', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                            'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );

# Bullets, dashes, and trademarks
$str = ereg_replace( chr(149), "•", $str );    # bullet •
$str = ereg_replace( chr(150), "–", $str );    # en dash
$str = ereg_replace( chr(151), "—", $str );    # em dash
$str = ereg_replace( chr(153), "™", $str );    # trademark
$str = ereg_replace( chr(169), "©", $str );    # copyright mark
$str = ereg_replace( chr(174), "®", $str );        # registration mark

    return $str;
}


It sounds like your data might be getting saved using the wrong character encoding. For example, the database might be storing text as Latin-1, but it is not converting user input to Latin-1 before storing it (MySQL can't make the distinction because Latin-1 is a single-byte character set, so whatever it gets could be valid).

By the time the application pulls data back out of the database for display, there's no way of knowing how the characters are actually encoded. Usually, this is combined with naïvely declaring the UTF-8 character encoding in the content-type header, which results in what you might call "WTF-8 encoding".

If you have filesystem access to the MySQL server, add the following to /etc/my.cnf:

[mysqld]
init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET NAMES utf8'
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake

Once you make this change, you will need to restart the mysqld service on your server.

You can verify this worked by connecting to the MySQL server manually and issuing the following command:

SHOW VARIABLES WHERE `Variable_name` LIKE 'character_set%' OR `Variable_name` LIKE `collation_%';

You should see something that looks like this:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
| collation_connection     | utf8_general_ci            |
| collation_database       | utf8_general_ci            |
| collation_server         | utf8_general_ci            |
+--------------------------+----------------------------+

You are not quite done, though; this only sets the default charset/collation for any future-created data. The existing data are not converted.

Fixing the existing data is not a particularly easy task, since you might have different rows in each table that were saved using different character encodings.

There are a couple of ways to accomplish it, though. One method that might work here is to convert each text column into a blob, and then convert it back to a text (or varchar, etc.; convert it back to the type it was before you made it a blob), which will force MySQL to try to fix the character encoding:

ALTER TABLE `(table name)` MODIFY `(column name)` BLOB;
ALTER TABLE `(table name)` MODIFY `(column name)` TEXT CHARACTER SET utf8;

See this article for more information.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜