开发者

Translate URLENCODED data into UTF-8 in PHP

I've got a string that is in my database like 中华武魂 when I post my request to retrieve the data via my website I'm getting the data to the server in the format %E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82

What decoding steps to I have to take in 开发者_运维百科order to get it back to the usable form? While also cleaning the user input to ensure they're not going to try an SQL injection attack? (escape string before or after encoding?)

EDIT:

 rawurldecode();  // returns "中åŽæ­¦é­‚"
 urldecode();     // returns "中åŽæ­¦é­‚"


public function utf8_urldecode($str) { 
    $str = preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;",urldecode($str)); 
    return html_entity_decode($str,null,'UTF-8'); 
}
 // returns "中åŽæ­¦é­‚"

... which actually works when I try and use it in an SQL statement.

I think because I was doing an echo and die(); without specifying a header of UTF-8 (thus I guess that was reading to me as latin)

Thanks for the help!


When your data is actually that percent-encoded form, you just have to call rawurldecode:

$data = '%E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82';
$str = rawurldecode($data);

This suffices as the data already is encoded in UTF-8: (U+4E2D) is encoded with the byte sequence 0xE4B8AD in UTF-8 and that is encoded with %E4%B8%AD when using the percent-encoding.

That your output does not seem to be as expected is probably because the output is interpreted with the wrong character encoding, probably Windows-1252 instead of UTF-8. Because in Windows-1252, 0xE4 represents ä, 0xB8 represents ¸, 0xAD represents å, and so on. So make sure to specify the output character encoding properly.


Use PHP's urldecode: http://php.net/manual/en/function.urldecode.php

You have choices here: urldecode or rawurldecode.

If you had encoded your string using urlencode, you must use urldecode because of the way spaces are handled. While urlencode converts spaces to +, it is not the same with rawurlencode.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜