开发者

鳠, 󮠹 and others

How t开发者_C百科o decode those?


These are XML entities character references.

You can decode them along with the rest of your XML with an XML parser.


What do you mean when you say “decode”? They represent the Unicode characters represented by those two hex numbers. That’s pretty simple, so you must be asking for something else.

Are you asking what sorts of things are known about those code points? This might help:

$ uniprops 9ce0 ee839
U+9CE0 ‹鳠› \N{ U+9CE0 }:
    \w \pL \p{L_} \p{Lo}
    All Any Alnum Alpha Alphabetic Assigned
       InCJKUnifiedIdeographs L Lo Gr_Base Grapheme_Base Graph
       GrBase Han Hani ID_Continue IDC ID_Start IDS Ideo
       Ideographic Letter L_ Other_Letter Print UIdeo
       Unified_Ideograph Word XID_Continue XIDC XID_Start XIDS
U+EE839 ‹U+EE839› \N{ U+EE839 }:
    \pC \p{Cn}
    All Any InNoBlock C Other Cn Unassigned Zzzz Unknown

So the second code point is currently unassigned, but the first one is up in the CJK Unified Ideographs block.

Was there something else you were curious about?


As others note, you've probably got some Unicode character references there. Assuming the other characters in your input are already valid UTF-8, you could convert the references to UTF-8 with something like this (I'm assuming PHP as you asked about it in related question)

//take 1-4 byte hex string, convert to UTF-8 byte sequence
function hex2utf8($hex)
{
    //pad to 8 hex digits (32 bits)
    $hex=str_pad($hex, 8, '0',STR_PAD_LEFT);

    //build binary data octet by octet
    $decode='';
    while (!empty($hex))
    {
        $byte=substr($hex,0,2);
        $hex=substr($hex,2);
        $decode.=chr(hexdec($byte));
    }

    //binary data is UTF-32, convert it to UTF-8
    return iconv('UTF-32BE', 'UTF-8', $decode);;
}

function convertReferencesToUTF8($str)
{
    return preg_replace('~&#x([0-9a-f]+);~ei', 'hex2utf8("\\1")', $str);
}


$encoded="Unicode references 鳠, 󮠹";
$utf8=convertReferencesToUTF8($encoded);

This seems a lot of code, maybe someone else will suggest something simpler.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜