PHP construct a Unicode string?

2023-01-15 20:06 问答作者：

Given a Unicode decimal or hex number for a character that's wanting to be output from a CLI PHP script, how can PHP generate it? The chr() function seems to not generate the proper output. Here's my test script, using the Section Break character U+00A7 (A7 in hex, 167 in decimal, should be represented as C2 A7 in UTF-8) as a test:

<?php
echo "Section sign: ".chr(167)."\n"; // Using CHR function
echo "Section sign: ".chr(0xA7)."\n";
echo "Section sign: ".pack("c", 0xA7)."\n"; // Using pack function?
echo "Section sign: 开发者_如何学Python§\n"; // Copy and paste of the symbol into source code

The output I get (via a SSH session to the server) is:

Section sign: ?
Section sign: ?
Section sign: ?
Section sign: §

So, that proves that the terminal font I'm using has the Section Break character in it, and the SSH connection is sending it along successfully, but chr() isn't constructing it properly when constructing it from the code number.

If all I've got is the code number and not a copy/paste option, what options do I have?

Assuming you have iconv, here's a simple way that doesn't involve implementing UTF-8 yourself:

function unichr($i) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}

PHP has no knowledge of Unicode when excluding the mb_ functions and iconv. You'll have to UTF-8 encode the character yourself.

For that, Wikipedia has an excellent overview on how UTF-8 is structured. Here's a quick, dirty and untested function based on that article:

function codepointToUtf8($codepoint)
{
    if ($codepoint < 0x7F) // U+0000-U+007F - 1 byte
        return chr($codepoint);
    if ($codepoint < 0x7FF) // U+0080-U+07FF - 2 bytes
        return chr(0xC0 | ($codepoint >> 6)).chr(0x80 | ($codepoint & 0x3F);
    if ($codepoint < 0xFFFF) // U+0800-U+FFFF - 3 bytes
        return chr(0xE0 | ($codepoint >> 12)).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
    else // U+010000-U+10FFFF - 4 bytes
        return chr(0xF0 | ($codepoint >> 18)).chr(0x80 | ($codepoint >> 12) & 0x3F).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
}

Don't forget that UTF-8 is a variable-length encoding.

§ is not included in the first 128 (ASCII) characters that UTF-8 is able to display in one byte. § is a multi-byte character in UTF-8, prepended by a c2 byte that signifies first byte of a two-byte sequence.. This should work:

echo "Section sign: ".chr(0xC2).chr(0xA7)."\n";

chr

(PHP 4, PHP 5)

chr — Return a specific character

Report a bug
 Description

string chr ( int $ascii )
Returns a one-character string containing the character specified by ascii.

This function complements ord().

important is the word ascii :) try this one:

  function uchr ($codes) {
        if (is_scalar($codes)) $codes= func_get_args();
        $str= '';
        foreach ($codes as $code) $str.= html_entity_decode('&#'.$code.';',ENT_NOQUOTES,'UTF-8');
        return $str;
    }
    echo "Section sign: ".uchr(167)."\n"; // Using CHR function
    echo "Section sign: ".uchr(0xA7)."\n";

I know I am reopening an old, solved issue, however since I stumbled into that topic searching for help, I thought I would share the solution I ended up with. The initial person asking the question might be interested in refactoring his/her code for the best.

Manually reprogramming ascii-to-unicode is like reinventing the wheel, not talking about errors/performance potential.

The best solution I found was to use:

pack to create values from input data, using the appropriate codes to eat the right amount of data, usually pack("H*", <input data>) to read from hexadecimal values
mb_convert_encoding to convert ASCII strings to unicode ones, using mb_convert_encoding(<ASCII string>, "UTF-8"). If the input string is not recognized properly, a third parameter of this function allows to specify the input encoding

继续阅读：php unicode

PHP construct a Unicode string?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？