开发者

Converting UCS2 (Unknown LE or BE) In Numeric Hex format to UTF-8 Using Perl

Hoping someone can point me in the direction of where i'm going wrong with this:

I have a string of (what I believe) is hex-encoded UCS2, but the provider cannot tell me if it is UCS2-LE or UCS2-BE.

Like so: 0627062E062A062806270631

It translates to this: اختبا

In Arabic apparently... but no-matter whether I try converting it out of hex, using it as straight UCS2 (LE or BE) or practically anything else I can think of under the sun, I can't turn it into native-perl UTF-8 so that I can then re-encode as standard UTF-8 (Native format of our system).

Code:

my $string = "0627062E062A062806270631";
my $decodedHex = hex($string);

#NEAREST
my $perlDecodedUTF8 = decode("UCS-2BE开发者_StackOverflow中文版", $decodedHex);
my $utf8 = encode('UTF-8',$perlDecodedUTF8);

open(ARABICTEST,">ucs2test.txt");
print(ARABICTEST $perlDecodedUTF8);
print("Done!");
close(ARABICTEST);

It outputs gibberish characters at the moment.

Now one idea I did come up with was to split the string in question into 4-character sections (i.e. per hex code), but even trying this with an individual, known UCS2 hex value doesn't appear to work.

Also tried forcing the output encoding, no joy there either.

Thanks!


hex is not the way to decode a hex string to a byte sequence. pack is. (hex produces a single integer, not a string of bytes.) Other than that, you were close. Try this:

use strict;
use warnings;
use Encode;

my $string = "0627062E062A062806270631";
my $decodedHex = pack('H*', $string);

my $perlDecodedUTF8 = decode("UCS-2BE", $decodedHex);

open(my $ARABICTEST,">:utf8", "ucs2test.txt");
print $ARABICTEST $perlDecodedUTF8;
print("Done!");
close($ARABICTEST);

Note: You probably want to use UTF-16BE instead of UCS-2BE. They're basically the same thing, but UTF-16BE allows surrogate pairs, and UCS-2BE doesn't. So all UCS-2BE text is also valid UTF-16BE, but not vice versa.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜