Problem with decoding unicode JSON in perl
I experience a strange behavior in Perl while trying to decode a Unicode JSON开发者_如何学Python string coming from a PHP script's json_encode
function. I simplified the problem to next code:
#!/usr/bin/perl
use CGI;
use JSON;
print CGI::header(-type=>'text/html', -charset=>'UTF-8');
print %{ decode_json('{"test_1" : "= \u00F9 ="}') }->{'test_1'};
print '<br>';
print %{ decode_json('{"test_2" : "= \u00F9 \u0121 ="}') }->{'test_2'};
When I run this script in browser I see next:
= � =
= ù ġ =
The first line contains a "broken character", the second is correct. What I think is happenning is that for some reason Perl decodes first string in ISO-8859-1 encoding, if I change page encoding to ISO-8859-1 the first line is correct, however the second is broken.
My Perl version is 5.10.1 and the JSON version is 2.51.
Question: how to force Perl json_decode
to return UTF-8 characters in the first print?
Note: I can fix the problem by manually converting first output to UTF-8, but this requires the installation of an additional "Encoder" module, which I want to avoid.
Tried your code and it generated several warnings with "use warnings;"
If you want to be sure to get utf8 I believe you have to tell Perl so. Use "binmode(STDOUT, ":utf8");" or similar.
This works on the command-line:
use strict;
use warnings;
use JSON;
binmode(STDOUT, ":utf8");
print decode_json('{"test_1" : "= \u00F9 ="}')->{test_1};
print '<br>';
print decode_json('{"test_2" : "= \u00F9 \u0121 ="}')->{'test_2'};
EDIT: AFAIK, this does not affect decode_json(), but the output from the perl script itself. Unicode tutorials often tell you to explicitly state what encoding you want on your input & output (filehandlers)
精彩评论