Character Encodings in PHP and MySQL
Our website was developed with a meta tag set to...
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
This works fine for M-dashes and special quotes, etc. However, I have an issue when data has been entered into a CMS component that stores data in MySQL. The MySQL collation is set to UTF8_swedish_ci (I read this is ok and must have been a default when it was set up in phpMySqlAdmin).
The problem I now get is when I output info from the DB to the page, the characters are utf8 encoded, so I run them through the uft8_decode() php function. I thought this would fix the incompatibility, but what I'm getting isn't what I expect.
When I look at the data in the DB in a text field (again through phpMySqlAdmin) it looks like this...
This – That
When I view it on the screen it looks like...
This ? That
I know I can try to find/replace a bunch of these in the DB开发者_如何学JAVA or the text, but I'm hoping there's an easier way to do this programatically.
Thanks, Don
Update:
Still have an issue that htmlentities() unfortunately doesn't fix.
I have text in a file like this: we’ve (special '). My MySQL collation is "latin1_swedish_ci" (the default). If I change the header or meta to either iso/utf one or the other breaks. W/ utf-8 the (’) a black diamond but the db content is fine. With iso, the inline content is ok, but the content from the db has all kinds of  and other chars. Tried changing MySQL collation to utf-8 but didn't see a difference.
I'm about resolved to changing the items manually. Thanks for any other suggestions.
If your data in the database is UTF8, you'll need to run this query after you connect to MySQL:
SET NAMES UTF8
Assuming that you were able to set the encoding properly in your database, my recommended approach here is to:
Make sure that the Content-Type header has been set properly by the server. This can be done in php by using the header function.
header('Content-Type: text/html; charset=iso-8859-1');
Note that this takes precedence and is the easiest information to get since user agents do not have to parse it.
Set the meta tag in the HTML file.
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
For further readings, refer to:
http://www.joelonsoftware.com/articles/Unicode.html
http://www.webstandards.org/learn/articles/askw3c/dec2002/
My guess would be that despite you meta tag, the web server sends a header which sets the charset to UTF-8. However, the easiest way to fix these kinds of problems is usually to escape non-ASCII-characters to HTML entities.
精彩评论