开发者

simplexml_load_file and encoding problem

SimpleXML will convert all text into UTF-8, if the source XML declaration has another encoding. So, all the text in the resulting SimpleXMLElement will be in UTF-8 automa开发者_如何学Pythontically.

In my case the source has the following XML decl:

<?xml version="1.0" encoding="windows-1251" ?>

What should I do so as to get normal output? Because, as you can imagine, for now I get stange symbols.

Thanks.


Maybe a stupid answer, but just don't use SimpleXML. Just use DOM.


Try using the iconv to convert the encoding.


Using the iconv() function you can convert from one encodign to another, the TRANSLIT option might work.

$xml = {STRING CONTAINING YOUR XML FILE DATA};

<?php

// convert string from utf-8 to iso8859-1
//$xml = iconv( "UTF-8", "ISO-8859-1//TRANSLIT", $xml);
$xml = iconv( "YOUR_ENCODING", "UTF-8//TRANSLIT", $xml);

?>


My advice is to use UTF-8 as source .php files encoding and (if possible) output encoding too. With gzip compression difference between size of windows-1251 and UTF-8 replies (even for mostly Cyrillic text) is minimal and UTF-8 is better in many ways. As you said, simplexml will convert windows-1251 to UTF-8 on xml import and then you don't have to worry about any encodings.

If you have to use windows-1251 for output then use something like: iconv_set_encoding("internal_encoding", "UTF-8"); iconv_set_encoding("output_encoding", "windows-1251"); ob_start("ob_iconv_handler");

One catchup for UTF-8 in PHP source files are char classes in regexps: /[ю]/ won't work as you might have expected, /(ю)/ will.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜