开发者

How to parse unicode format (e.g. \u201c, \u2014) using PHP

I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014

Is there a function to convert those characters into HTML? i.e \u2014 -> —

If you have some further reading on these character codes), or suggested reading about unicode in g开发者_运维知识库eneral I would appreciate it. This is so confusing to me. I don't know what to call these codes... I guess unicode, but unicode seems to mean a whole lot of things.


that's not entirely true bobince. How do you handle json containing spanish accents? there are 2 problems. I make FB.api(url, function(response) ... var s=JSON.stringify(response);

and pass it to a php script via $.post

First I get a truncated string. I need escape(JSON.stringify(response)) Then I get a full json encoded string with spanish accents. As a test, I place it in a text file I load with file_get_contents and apply php json_decode and get nothing. You first need utf8_encode.

And then you get awaiting object of your desire. After a full day of test and google without any result when decoding unicode properly, I found your post. So many thanks to you.


Someone asked me to solve the problem of Arabic texts from the Facebook JSON archive, maybe this code helps someone who searches for reading Arabic texts from Facebook (or instagram) JSON:

    $str = '\u00d8\u00ae\u00d9\u0084\u00d8\u00b5';

    function decode_encoded_utf8($string){
        return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
    }
    echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", decode_encoded_utf8($str));


Facebook Graph API returns JSON objects. Use json_decode() to read them into PHP and you do not have to worry about handling string literal escapes like \uNNNN. Don't try to decode JSON/JavaScript string literals by yourself, or extract chosen properties using regex.

Having read the string value, you'll have a UTF-8-encoded string. If your target HTML is also UTF-8-encoded, you don't need to replace (U+2014) with any entity reference. Just use htmlspecialchars() on the string when outputting it, so that any < or & characters in the string are properly encoded.

If you do for some reason need to produce ASCII-safe HTML, use htmlentities() with the charset arg set to 'utf-8'.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜