How to parse unicode format (e.g. \u201c, \u2014) using PHP

2023-01-12 16:21 问答作者：

I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014

Is there a function to convert those characters into HTML? i.e \u2014 -> —

If you have some further reading on these character codes), or suggested reading about unicode in g开发者_运维知识库eneral I would appreciate it. This is so confusing to me. I don't know what to call these codes... I guess unicode, but unicode seems to mean a whole lot of things.

that's not entirely true bobince. How do you handle json containing spanish accents? there are 2 problems. I make FB.api(url, function(response) ... var s=JSON.stringify(response);

and pass it to a php script via $.post

First I get a truncated string. I need escape(JSON.stringify(response)) Then I get a full json encoded string with spanish accents. As a test, I place it in a text file I load with file_get_contents and apply php json_decode and get nothing. You first need utf8_encode.

And then you get awaiting object of your desire. After a full day of test and google without any result when decoding unicode properly, I found your post. So many thanks to you.

Someone asked me to solve the problem of Arabic texts from the Facebook JSON archive, maybe this code helps someone who searches for reading Arabic texts from Facebook (or instagram) JSON:

    $str = '\u00d8\u00ae\u00d9\u0084\u00d8\u00b5';

    function decode_encoded_utf8($string){
        return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
    }
    echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", decode_encoded_utf8($str));

Facebook Graph API returns JSON objects. Use json_decode() to read them into PHP and you do not have to worry about handling string literal escapes like \uNNNN. Don't try to decode JSON/JavaScript string literals by yourself, or extract chosen properties using regex.

Having read the string value, you'll have a UTF-8-encoded string. If your target HTML is also UTF-8-encoded, you don't need to replace — (U+2014) with any entity reference. Just use htmlspecialchars() on the string when outputting it, so that any < or & characters in the string are properly encoded.

If you do for some reason need to produce ASCII-safe HTML, use htmlentities() with the charset arg set to 'utf-8'.

继续阅读：facebook-graph-api php unicode

How to parse unicode format (e.g. \u201c, \u2014) using PHP

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？