开发者

Weird utf8 conversion problem in php

So I'm working on a project that is taking data from a file, in the file some lines requi开发者_StackOverflowre utf8 symbols but are encoded oddly, they are \xC6 for example rather than being \Æ

If I do as follows:

$name = "\xC6ther";
$name = preg_replace('/x([a-fA-F0-9]{2})/', '&#$1;', $name);
echo utf8_encode($name);

It works fine. I get this:

Æther

But if I pull the same data from MySQL, and do as follows:

$name = $row['OracleName'];
$name = preg_replace('/x([a-fA-F0-9]{2})/', '\&#$1;', $name);
$name = utf8_encode($name);

Then I receive this as output:

\&#C6;ther

Anyone know why this is?

As requested, vardump of $row['OracleName'];

string(15) "xC6ther Barrier" 


on your second preg_replace why there is a \

preg_replace('/x([a-fA-F0-9]{2})/', '&#$1;', $name);

ok I think there is some confusion here. you regular expression is matching something like x66 and would replace that by '&#66', which seems to be some html entities encoding to me but you are using utf8_encode which do that (from manual):

utf8_encode — Encodes an ISO-8859-1 string to UTF-8

so the things would never get converted ... (or to be more precise the '&#66' would remains '&#66' since they are all same characters in ISO-8859-1 and UTF-8)

also to be noted on your first snippet you use \xC6 but this would never get caught by the preg_replace since it's already encoded character. The \x means the next hex number (0x00 ~ 0xFF) would be drop in the string as is. it won't make a string xC6

So I am kind of confused of what you really wanna do. what the preg_replace is all about?

if you want to convert HTML entities to UTF-8 look into mb_convert_encoding (manual), if you want to do the reverse, code in HTML entities from some UTF-8 look into htmlentities (manual)

and if it has nothing to do with all of that and you want to simply change encoding mb_convert_encoding is still there.


Figured out the problem, on the SQL pull I missed an 'x' in the preg_replace

preg_replace('/x([a-fA-F0-9]{2})/', '&#x$1;', $name);

Once I added in the x, it worked like a charm.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜