开发者

XML::Simple encoding problem

I have an xml-file I want to parse:

<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>

It's perfectly parsed开发者_运维知识库 by firefox. But XML::Simple corrupts some data. I have a perl-program like this:

my $content = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n";
$content .= "<tag>\x{c3}\x{bb}</tag>\n";

print "input:\n$content\n";

my $xml = new XML::Simple;
my $data = $xml->XMLin($content, KeepRoot => 1);

print "data:\n";
print Dumper $data;

and get:

input:
<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>

data:
$VAR1 = {
          'tag' => "\x{fb}"
        };

it doesn't seem to be what I expected. I think there some encoding issues. Am I doing something wrong?

UPD: I thought that XMLin returned text in utf-8 (as the input). Just added

encode_utf8($data->{'tag'});

and it worked


XML::Simple is fickle.

Its calling Encode::decode('UTF-8',$content) which is putting your UTF-8 in native.

Do this:

my $content_utf8 = "whatevér";
my $xml = XMLin($content_utf8);
my $item_utf8 = Encode::encode('UTF-8',$xml->{'item'});

This sort of works too, but risky w/ double encoding:

my $content_utf8 = "whatevér";
my $double_encoded_utf8 = Encode::encode('UTF-8',$content_utf8);
my $xml = XMLin($double_encoded_utf8);
my $item_utf8 = $xml->{'item'};


Hexadecimal FB (dec 251) is ASCII code of "û" character. Could you please elaborate on what you expected to get in the data structure which leads you to conclude what you got was "corrupt"?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜