Question about the "utf-8"-behavior

2023-02-14 00:44 问答作者：

#!/usr/bin/env perl
use warnings;
use 5.012;
use Encode qw(encode);

no warnings qw(utf8);

my $c = "\x{ffff}";

my $utf_8 = encode( 'utf-8', $c );
my $utf8 = encode( 'utf8', $c );

say "utf-8 :  @{[ unpack '(B8)*', $utf_8 ]}";
say "utf8  :  @{[ unpack '(B8)*', $utf8 ]}";

# utf-8 :  11101111 10111111 10111101
# utf8  :  11101111 10111111 101111开发者_Python百科11

Does the "utf-8" encode this way, to fix my codepoint automaticaly to the last interchangeable codepoint (of the first plane)?

See the UTF-8 vs. utf8 vs. UTF8 section of the Encode docs.

To summarize, Perl has two different UTF-8 encodings. Its native encoding is called utf8, and basically allows any codepoint, regardless of what the Unicode standard says about that codepoint.

The other encoding is called utf-8 (a.k.a. utf-8-strict). This allows only codepoints that are assigned by the Unicode standard.

\x{FFFF} is not a valid codepoint according to Unicode. But Perl's utf8 encoding doesn't care about that.

By default, the encode function replaces any character that does not exist in the destination charset with a substitution character (see the Handling Malformed Data section). For utf-8, that substitution character is U+FFFD (REPLACEMENT CHARACTER), which is encoded in UTF-8 as 11101111 10111111 10111101 (binary).

继续阅读：encode perl unicode utf-8

Question about the "utf-8"-behavior

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？