UTF-8 and HTML entities
I try to eject text from Word .DOC file with PHP. All seems ok, but the only trouble is something like
СУДОВА БУХГАЛТЕРІЯ
instead of 开发者_开发百科russian text. I've tried to use html_entity_decode and utf8_encode, but they didn't help. Is there any simple solution?
html_entity_decode
should work with the proper parameters (unless you’re using PHP 5.3.3 or later):
html_entity_decode($str, ENT_QUOTES, 'UTF-8')
This will convert the character references into UTF-8. Before PHP 5.3.3, the charset parameter’s default value was ISO-8859-1
. In that case the cyrillic characters can’t be converted as the ISO 8859-1 character set doesn’t contain them.
精彩评论