Problem converting ISO8859-1 to UTF-8 in PHP
I am attempting to convert a ISO8859-1 string taken from a MySQL database and convert it to UTF-8 using php. However, when I use the utf8_encode fu开发者_运维知识库nction it removes almost all of the apostrophes from the string (the exceptions seem to be within html fields).
Thanks
Your ‘ISO-8859-1’ content is probably not actually ISO-8859-1.
When you say Content-Type: text/html; charset=iso-8859-1
, browsers don't actually use ISO-8859-1, for annoying historical reasons. They really use Windows code page 1252 (Western European), which is very similar to ISO-8859-1, but not the same.
In particular, the bytes in the range 0x80-0x9F represent invisible and seldom-used control codes in ISO-8859-1. But cp1252 adds some typographical niceties and other extensions in this range, including the ‘smart quotes’. When you write an apostrophe in MS Word, it changes it to a single left-facing smart-quote ’
, so it's common to have encoding problems with text that was original typed in Word and other Office apps.
To convert cp1252 to UTF-8 you would have to use iconv('cp1252', 'utf-8', $somestring)
rather than utf8_encode
which is tied to ‘real’ ISO-8859-1.
One possibility is to use Iconv. I have used it before and it is quite good.
http://php.net/manual/en/function.iconv.php
It has a TRANSLIT option which can approximate the character.
精彩评论