How do I write character ALT-0146 to an XML file using Perl?
’
That is the character, and I cannot find a way to detect, replace, or write it properly to an XML file. At first I was using string concatenation, then I wisened up to XML::Writer, but it still won't work, the XML is still broken afterward.(Need it in UTF-8)
This is a test I wrote that still breaks:
my $output = new IO::File(">$foundFilePath");
my $writer = new XML::Writer(OUTPUT => $output);
$writer->xmlDecl("UTF-开发者_运维百科8");
$writer->startTag("xml");
$writer->startTag("test");
$writer->characters("’");
$writer->endTag("test");
$writer->endTag("xml");
$writer->end();
$output->close();
To be more specific, I am trying to get the data from this page: http://investing.businessweek.com/businessweek/research/stocks/private/snapshot.asp?privcapId=4439466
And Mr. William O’Keefe is messing everything up.
There are two things you need to do. If you want to write UTF-8 to a file, you need to say so:
my $output = IO::File->new($foundFilePath, ">:utf8");
And if you want to use literal UTF-8 strings in your source code, you need to say
use utf8;
at the beginning of your program. Otherwise, Perl assumes your source code is Latin-1.
Here's a complete example script:
use utf8;
use strict;
use warnings;
use IO::File;
use XML::Writer;
my $foundFilePath = 'test.xml';
my $output = IO::File->new($foundFilePath, ">:utf8");
my $writer = XML::Writer->new(OUTPUT => $output);
$writer->xmlDecl("UTF-8");
$writer->startTag("xml");
$writer->startTag("test");
$writer->characters("’");
$writer->endTag("test");
$writer->endTag("xml");
$writer->end();
$output->close();
精彩评论