开发者

Does Perl's Net::Cassandra module support UTF-8?

I've run into a really strange UTF-8 开发者_Go百科problem with Net::Cassandra::Easy (which is built upon Net::Cassandra): UTF-8 strings written to Cassandra are garbled upon retrieval.

The following code shows the problem:

use strict;
use utf8;
use warnings;
use Net::Cassandra::Easy;

binmode(STDOUT, ":utf8");

my $key = "some_key";
my $column = "some_column";
my $set_value = "\x{2603}"; # U+2603 is ☃ (SNOWMAN)
my $cassandra = Net::Cassandra::Easy->new(keyspace => "Keyspace1", server => "localhost");
$cassandra->connect();
$cassandra->mutate([$key], family => "Standard1", insertions => { $column => $set_value });
my $result = $cassandra->get([$key], family => "Standard1", standard => 1);
my $get_value = $result->{$key}->{"Standard1"}->{$column};
if ($set_value eq $get_value) {
    # this is the path I want.
    print "OK: $set_value == $get_value\n";
} else {
    # this is the path I get.
    print "ERR: $set_value != $get_value\n";
}

When running the code above $set_value eq $get_value evaluates to false. What am I doing wrong?


Add use Encode; to the beginning of your script, and pass variables through Encode::decode_utf8. For example:

my $get_value = $result->{$key}->{"Standard1"}->{$column};
$get_value = Encode::decode_utf8($get_value);

Outputs:

OK: ☃ == ☃

When you set $set_value to "\x{2603}", Perl detects the wide character and sets the string encoding to UTF-8 for you. To confirm this, print the return value of Encode::is_utf8($set_value).

Unfortunately, once this string goes into Cassandra and back out again, the encoding information is lost. It appears that Cassandra is encoding-agnostic. Calling Encode::decode_utf8 tells Perl that you have a string containing a UTF-8 byte sequence, and that it should be converted into Perl's internal representation for Unicode. As jrockway points out, you should also call Encode::encode_utf8 on any strings before they are sent to Cassandra, although in most cases Perl already knows they are UTF-8, for example if you've opened a file with the :utf8 encoding layer.

If you use UTF-8 often, you might want to write a wrapper over Net::Cassandra::Easy to do this automatically.

Finally, you don't need use utf8; unless your Perl source code (variable names, comments etc.) contains UTF-8 characters. Perl can handle UTF-8 strings whether you specify use utf8; or not.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜