开发者

Problem converting ISO8859-1 to UTF-8 in PHP

I am attempting to convert a ISO8859-1 string taken from a MySQL database and convert it to UTF-8 using php. However, when I use the utf8_encode fu开发者_运维知识库nction it removes almost all of the apostrophes from the string (the exceptions seem to be within html fields).

Thanks


Your ‘ISO-8859-1’ content is probably not actually ISO-8859-1.

When you say Content-Type: text/html; charset=iso-8859-1, browsers don't actually use ISO-8859-1, for annoying historical reasons. They really use Windows code page 1252 (Western European), which is very similar to ISO-8859-1, but not the same.

In particular, the bytes in the range 0x80-0x9F represent invisible and seldom-used control codes in ISO-8859-1. But cp1252 adds some typographical niceties and other extensions in this range, including the ‘smart quotes’. When you write an apostrophe in MS Word, it changes it to a single left-facing smart-quote , so it's common to have encoding problems with text that was original typed in Word and other Office apps.

To convert cp1252 to UTF-8 you would have to use iconv('cp1252', 'utf-8', $somestring) rather than utf8_encode which is tied to ‘real’ ISO-8859-1.


One possibility is to use Iconv. I have used it before and it is quite good.

http://php.net/manual/en/function.iconv.php

It has a TRANSLIT option which can approximate the character.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜