how do i get rid of unrecognized characters in utf-8? mysql/php
I have a mysql database that's set to utf-8.
I have set my php header to: header("Content-Type: text/html; charset=utf-8");
and in my html: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
When I return anything that has round quotes or apostrophes, they show up as unrecognized characters (black diamond with a ? inside).
If I run utf8_encod开发者_开发知识库e () on the string I'm echoing out, it looks fine in Chrome, but shows a different weird character in Firefox. Is there something else I can do site-wide to make this work better?
(I've accessed the db with sequel pro and phpmyadmin)
full utf-8 settings:
1) .htaccess
AddDefaultCharset utf-8
PHP_VALUE default_charset utf-8
2) after mysqli_connect() in php call this:
mysqli_query($this->link, 'SET character_set_client="utf8",character_set_connection="utf8",character_set_results="utf8"; ');
3) your DB should be created with "collation: utf8" charset; all fields in table also should be "collation: utf8"
4) your PHP files also should be created with utf8 charset
Make sure the communication method is in UTF-8. Otherwise, it will be converted.
See mysql_client_encoding and mysql_set_charset
have you tried using htmlentities? i know that this doesn't affect the character encoding, but it might get rid of the black square with the question mark. it often does for me...
$output = htmlentities($db_output);
echo $output;
How exactly are you getting these "round quotes and apostrophes"? If their ultimate source is a Word or Outlook document, they will be encoded in Windows-1252. If you copy and paste directly from a Word document into a UTF-8 Web page, the UTF-8 version of the clipboard should be used, and these characters come over as multibyte UTF-8 characters. If these characters went through other files or non-UTF-8 Web pages first, it's possible that they remained in Word "Smart Quote" single-byte encoding, which is invalid in UTF-8 (and thus the ?-in-black-diamond glyph). Note that Web pages claiming to be Latin-1 (ISO-8859-1) are frequently rendered as Windows-1252, as 1) the control codes x80-x9F that Smart Quotes overlay are very rarely used, and 2) it's so common for Smart Quotes to be mixed in with text.
For a UTF-8 page that gives quotes and apostrophes as "invalid characters", tell the browser to use Windows-1252 encoding instead for the page (View > Character Encoding or something similar). If these characters show up correctly now, untranslated Smart Quotes were the problem. Unfortunately, once they're in the database, only manual editing will fix them.
精彩评论