How do I encode characters into numeric character reference format in Perl?
I found this sample script from How can I guess the encoding of a string in Perl?
#!C:\perl\bin
use utf8;
use Encode qw(encode PERLQQ XMLCREF);
my $string = 'This year I went to 北京 Perl workshop.';
#print encode('ascii', $string, PERLQQ);
# This year I went to \x{5317}\x{4eac} Perl workshop.
print encode('ascii', $string, XMLCREF); # This year I went to 北京 Perl workshop.
After having a test I found the encoded output result to be:
This year I went to \x{71fa9} Perl workshop.
This year 开发者_开发问答I went to 񱾩 Perl workshop.
Looks like the result is different from the one author shows us above in the sample code.
I wonder how could I encode a character string and make its output in the numeric character reference format (&#xHHHH;
), for example when:
my $string = 'This year I went to 北京 Perl workshop.';
the encoded output would be:
This year I went to 北京 Perl workshop.
I am the author of the answer linked in the question.
You made a simple mistake, you saved the Perl program in GB18030. When it contains use utf8;
, you must save it in UTF-8 instead.
$string =~ s/[^\0-\377]/ sprintf '&#x%04x;', ord($&) /ge
Find each character in $string
not in the range 0-255 (i.e., any wide characters), and replace it with the value of the expression sprintf '&#x%04x;', ord($&)
, where $&
is the wide character that was matched.
use utf8;
$string = "This year I went to \x{5317}\x{4eac} Perl workshop.";
$string =~ s/[^\0-\377]/ sprintf '&#x%04x;', ord($&) /ge;
print $string;
Produces:
This year I went to 北京 Perl workshop.
精彩评论