PHP UTF-encoded URL-string
When I type in 开发者_如何学JAVAFirefox (in the address line) URL like http://www.example.com/?query=Траливали, it is automatically encoded to http://www.example.com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8.
But URL like http://www.example.com/#ajax_call?query=Траливали is not converted.
Other browsers such as IE8 do not convert query at all.
The question is: how to detect (in PHP) if query is encoded? How to decode it?
I've tried:
$str = iconv('cp1251', 'utf-8', urldecode($str) );
$str = utf8_decode(urldecode($str));
$str = (urldecode($str));
many functions from http://php.net/manual/en/function.urldecode.php Nothing works.
Test:
$str = $_GET['str'];
d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urldecode('%D2%F0%E0%EB%E8%E2%E0%EB%E8'));
d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == $str);
d('Траливали' == $str);
d(urldecode($str));
d(utf8_decode(urldecode($str)));
!!! d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urlencode($str)); !!!
Returns:
[false] [false] [false] ��������� ???? [true]
Some kind of a solution: http://www.example.com/Траливали/ - send a query as a url part and parse with mod_rewrite.
It is not converted as having the query
part of the URL after the fragment is not valid.
RFC 3986 defines a URI as composed of the following parts:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
The order cannot be changed. Therefore,
URL1: http://www.example.com/?query=Траливали#ajax_call
will be handled properly while
URL2: http://www.example.com/#ajax_call?query=Траливали
will not. If we look at URL2
, IE actually handles the URL properly by detecting the fragment as #ajax_call?query=Траливали
without a query. Fragment is always last and are never sent to the server.
IE will properly encode the query component of URL1
as it will detect it as a query.
As for decoding in PHP, %D2
and similar is automatically decoded in the $_GET['query']
variable. The reason why the $_GET
variable was not properly populated was because in URL2
, there is no query according to the standard.
Also, one last thing... when doing 'Траливали' == $_GET['query']
, this will only be true if your PHP script itself is encoded in UTF-8. Your text editor should be able to tell you the encoding of your file.
rawurldecode($_GET['query']);
but this should actually have been done already by php ;-)
edit you're stating "nothing works" - what are you trying? if the text doesn't appear on screen as you want it, when you echo $_GET['query'];
for example, your problem might be the encoding you are specifying for the page sent back to the browser.
Include a line
header("Content-Type: text/html; charset=utf-8");
and see if it helps.
How the fragment is encoded, is unfortunately, browser-dependent:
Is fragment ID (hash) encoded by applying RFC-mandated URL escaping rules?
MSIE: NO
Firefox: PARTLY
Safari: YES
Opera: NO
Chrome: NO
Android: YES
As to the question of what encoding the browser uses to encode international (read: non-ASCII) characters before converting them to %nn
escape sequences, "most browsers deal with this by sending UTF-8 data by default on any text entered in the URL bar by hand, and using page encoding on all followed links." (same source).
You could use UTF8::autoconvert_request()
for this.
Take a look at http://code.google.com/p/php5-utf8/ for more information.
URLs are limited to certain ascii chars. Non-url friendly chars are supposed to be url-encoded (the %hh encoding you see). Some browsers might automatically encode urls that appear on the addr line.
The answer is easy: string being encoded always. As it's stated in the HTTP standard.
And what is firefox displays - it doesn't matter.
Also, as PHP decode query string automatically, no decoding required either.
Note that '%D2%F0%E0%EB%E8%E2%E0%EB%E8' is single-byte encoding, so, you have your page probably in 1251. At least HTTP header says that to the browser.
While AJAX always use utf-8.
So, you have just to either use single encoding (utf-8) for your pages, or distinguish ajax calls from regular ones.
As for the fragment - do not use a fragment value to send it to the server. Have a JS variable, and then use it twice - to set a fragment and to send to the server using JSON.
RFC 1738 states that only alphanumerics, the special characters $-_.+!*'(),"
and reserved characters ;/?:@=&
are unencoded within a URL. Everything else is encoded by the HTTP client, i.e. Web browser. You can use rawurldecode() whether or not PHP automatically decodes the query string. There's no danger in double-decoding.
精彩评论