开发者

UTF-8 and HTML entities

I try to eject text from Word .DOC file with PHP. All seems ok, but the only trouble is something like

СУДОВА БУХГАЛТЕРІЯ

instead of 开发者_开发百科russian text. I've tried to use html_entity_decode and utf8_encode, but they didn't help. Is there any simple solution?


html_entity_decode should work with the proper parameters (unless you’re using PHP 5.3.3 or later):

html_entity_decode($str, ENT_QUOTES, 'UTF-8')

This will convert the character references into UTF-8. Before PHP 5.3.3, the charset parameter’s default value was ISO-8859-1. In that case the cyrillic characters can’t be converted as the ISO 8859-1 character set doesn’t contain them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜