How do I transform "ТеÑ" (it is russian word) into something readable?
I got MySQL DB which contains UTF8 column with such "ТеÑ" records. PHP's mb_detect_encoding() told me that this is UTF-8. How can I transform this "horror" into something readable?
开发者_如何学GoThank you
I'm guessing you've got the byte string "\xd0\xa2\xd0\xb5\xd1"
, then, which would be the UTF-8 encoded form of the characters Те
(plus one following byte which is half a character).
If you merely echo()
that on a page that you have declared as being UTF-8, it should display correctly on the browser:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
...
something: <?php echo htmlspecialchars($something); ?>
This naturally also means you will need to save the .php
file itself using the UTF-8 encoding, if it has any non-ASCII characters in. (Many Windows text editors tend not to save as UTF-8 by default, sadly.)
If you must have a non-UTF-8 page, you would have to using iconv()
to convert the string to whatever encoding you were using, presumably Windows code page 1251 for Russian ('cp1251'
). But I would strongly recommend using UTF-8 for everything all the way through.
edit re comment:
if I'm doing mysql_set_charset("utf8", $db) before selecting row - I'm getting this "horror"
mysql_set_charset('utf8')
is indeed the right thing to do. Check you are including the meta
as above, and that the browser is seeing it (check View->Encoding is UTF-8).
If you are getting ТеÑ
even with UTF-8 correctly getting sent, then I'm afraid the current contents of your database are messed up. Perhaps data had been inserted previously without the correct mysql_set_charset
call, or maybe you did an SQL import that used the wrong charset.
If this is the case, you're likely going to have to go through each row of the database and ‘fix’ it by using iconv()
to convert UTF-8 to ISO-8859-1. This should undo the double-UTF-8-encoding.
[edit:2]
iconv("UTF-8", "ISO-8859-1", $row['name']) saying Notice: iconv(): Detected an illegal character in input string.
OK, so the input isn't a valid UTF-8 sequence. That could either be because you're not getting UTF-8 out of the database after all, or because a UTF-8 sequence has become truncated. For example your string "\xd0\xa2\xd0\xb5\xd1"
(which, read as ISO-8859-1, looks like "ТеÑ"
), is not valid, as the final "Ñ"
is only half of a two-byte UTF-8 sequence. As UTF-8 in a browser it would render as Те�
.
If that's what you have in your database you'll need to fix the data in there before you can proceed.
it's ok if I echo
$row['name']
without doingmysql_set_charset("utf8", $db)
You haven't confirmed that you are correctly sending UTF-8 and that the browser knows this (by checking View->Encoding), so it's not really meaningful what you see on-screen when you echo()
; we can't work out what the original byte string was from that.
Tell us what you see when you echo bin2hex($row['name']);
. This will convert each byte in the string into hex digits, so "\xd0\xa2\xd0\xb5\xd1"
would come out as d0a2d0b5d1
, if that's what you've got.
output to page with UTF8 encoding specified. browser will show it in readable form.
精彩评论