How do you get the glyph for a character encoded as 'ō' from a utf-8 encoded database field using php?
I have a MySQL database table with a collation of 'utf8_general_ci' and the value in the field is:
x & #299; bán yá wén (without the spaces).
When this is converted (for example by StackOverflow's editor) it looks like this:
xī bán yá wén
where the second character looks like a lower case i with a bar over th开发者_如何学Goe top.
In PHP, what function converts the & #299 ; entity into the ī character?
I've tried using html_entity_decode($str,ENT_COMPAT,'UTF-8'), however I get characters like the following:
yÄ«n wén or zhÅ•ng wén
I'm pretty sure there's something I don't understand about the decoding, which is why I'm using the wrong function. Can anyone shed some light on how to get the single character glyph that's represented by the entity & #299 and similar high-number characters above 255?
Many thanks, AE
UTF-8 is a multibyte encoding. As such if you look at it through a single-byte encoding such as Latin-1 you'll see something much like the results you're seeing. Set the document encoding to UTF-8 to see the actual character.
As for your first question, it's actually the browser that's decoding the character reference and printing the character, not PHP.
I suggest you read through this page: Unicode for the working PHP programmer. It is not long and it should get you over the hump and into confident Unicode and UTF-8.
Once you're OK with that stuff, check out the mbstring and intl PHP extensions, which are very handy. And know which string functions in PHP are and are not safe to use on multibyte strings. Here's the notes I made when I was transitioning a site to UTF-8 which includes a list of naughty string functions.
精彩评论