PHP utf encoding problem
How can I encode strings on UTF-16BE format in PHP? For "Demo Message!!!" the encoded string should be '00440065006D006F0020004D0065007300730061开发者_如何学Go0067006'. Also, I need to encode Arabic characters to this format.
First of all, this is absolutly not UTF-8, which is just a charset (i.e. a way to store strings in memory / display them).
WHat you have here looks like a dump of the bytes that are used to build each characters.
If so, you could get those bytes this way :
$str = utf8_encode("Demo Message!!!");
for ($i=0 ; $i<strlen($str) ; $i++) {
$byte = $str[$i];
$char = ord($byte);
printf('%02x ', $char);
}
And you'd get the following output :
44 65 6d 6f 20 4d 65 73 73 61 67 65 21 21 21
But, once again, this is not UTF-8 : in UTF-8, like you can see in the example I've give, D
is stored on only one byte : 0x44
In what you posted, it's stored using two Bytes : 0x00 0x44
.
Maybe you're using some kind of UTF-16 ?
EDIT after a bit more testing and @aSeptik's comment : this is indeed UTF-16.
To get the kind of dump you're getting, you'll have to make sure your string is encoded in UTF-16, which could be done this way, using, for example, the mb_convert_encoding
function :
$str = mb_convert_encoding("Demo Message!!!", 'UTF-16', 'UTF-8');
Then, it's just a matter of iterating over the bytes that make this string, and dumping their values, like I did before :
for ($i=0 ; $i<strlen($str) ; $i++) {
$byte = $str[$i];
$char = ord($byte);
printf('%02x ', $char);
}
And you'll get the following output :
00 44 00 65 00 6d 00 6f 00 20 00 4d 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21
Which kind of looks like what youy posted :-)
(you just have to remove the space in the call to printf
-- I let it there to get an easier to read output=)
E.g. by using the mbstring extension and its mb_convert_encoding() function.
$in = 'Demo Message!!!';
$out = mb_convert_encoding($in, 'UTF-16BE');
for($i=0; $i<strlen($out); $i++) {
printf("%02X ", ord($out[$i]));
}
prints
00 44 00 65 00 6D 00 6F 00 20 00 4D 00 65 00 73 00 73 00 61 00 67 00 65 00 21 00 21 00 21
Or by using iconv()
$in = 'Demo Message!!!';
$out = iconv('iso-8859-1', 'UTF-16BE', $in);
for($i=0; $i<strlen($out); $i++) {
printf("%02X ", ord($out[$i]));
}
精彩评论