开发者

DomDocument and special characters written in two bytes

I have a web application, written in PHP, based on UTF-8 (both PHP and MySQL are on UTF-8). Everything is beautiful - no problem with special characters.

However, I had to build an export to XML with encoding ISO-8859-2 (Polish), so I picked DomDocument because it has built in encoding conversion.

But when I had sent the XML to my partner for validation, he said that one of tags have too many characters. It was strange because it had the specific maximum number of characters. Then I have opened the file in HexEditor and s开发者_运维知识库aw that every special character has two bytes.

I have tried to convert the result with iconv and mb_convert_encoding.

Iconv says:

iconv() [<a href='function.iconv'>function.iconv</a>]: Detected an illegal character in input string in file application/controllers/report/export.php at 169

mb_convert_encoding is simply deleting all special characters and result is encoded in ASCII.

Is there a way to convert the output of DomDocument to one-byte characters?

Thanks in advance!


One problem when switching between encodings is that, even with transliteration, not all characters are representable in other encodings in a single byte.

For example, consider the EURO SIGN, a character that takes 3 bytes when encoded in UTF-8. If you look at the charset support page, you can see that ISO-8859-2 is not listed.

Since there is not a single character to represent the euro sign, then transliteration does its best to still represent it in the output

echo iconv( 'UTF-8', 'ISO-8859-2//TRANSLIT', '€' ); // EUR

In this example, we still end up with 3 bytes to represent the euro sign after transliterating.

EDIT

P.S. The NOTICE level error you're getting is because you executed iconv() without the transliteration flag. And as I highlighted above, the EURO SIGN doesn't exist in ISO-8859-2, so you clearly have at least one character in your data that also doesn't exist in ISO-8859-2, so you'll have to use transliteration. Just know that it doesn't guarantee that you'll get down to 1 byte/char.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜