php urldecode utf8 encoding question
when I'm trying to _GET url with urlencoded value (some cyrilic word):
http://example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1%E8%EB%FC
after decoding:
echo urldecode($_GET['q']); // it prints: ���������
so, I need do conversion to utf-8 (because whole my application works with utf-8) via:
mb_convert_encoding($_GET['q'], "UTF-8", "windows-1251");
and it helps, but question:
Who/what says it should be EXACTLY "windows-1251" ? where from it comes? if i'll use some other languages, how I c开发者_Python百科an define appropriate encoding? where is the magic?
(update): page encoding is utf-8 (update): actually, urldecode($_GET['q']) even not needed, looks like apache+php module doing everything, but, still can't understand where configs are
The answer is that you can't know that for sure, as it might change from request to request, especially if it is not always submitted from form, but sometimes send with ajax, or typed directly in address bar by user.
I work with an appliction which is Polish language. The application works with ISO-8859-2 codepage, and all the html output is served in this encoding.
The application receives request in two different encodings, depending on the context of request:
- If the request is made as a result of form submit, then the encoding is the same as the html page with the submitted form. I think it could be altered with accept-charset attribute of form element, but I have not tried it.
- If the request is made with Ajax then it is always UTF-8 (at least in Chrome and Firefox, as our client uses only those browsers).
- If the request is manually entered into the URL, then it is usually UTF-8, but if it was a bookmark or something like that, then it might be other encoding (depends on how the bookmark was created).
So, really no way to know for sure. If you can, always use UTF-8. Otherwise use charset detection (check if it is UTF-8, if not fall back to the most probable encoding based on the language your application is using).
I use the following code:
<?php
$t = 'zażółć gęślą jaźń';
echo mb_detect_encoding($t, 'UTF-8,ISO-8859-2');
Best regards, SWilk
it is not apache nor mod_php issue. PHP does decode urlencoding automatically but it doesn't encode anything, so, there is nothing to worry about
as it seems from this
when typing in Firefox3 example.com/?action=search&q=автомобиль it converts automatically to: example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1%E8%EB%FC
it's more like browser or operation system issue.
it seems that your OS encoding is single-byte and browser does urlencode your single-byte string.
You should keep UTF8 and set your page's charset to UTF8 using the appropriate content-type header:
header('Content-type: text/html; charset=utf-8');
When you type non-ASCII characters directly into the URL search bar, the browser seems to automatically convert the characters into UTF-8 and URL encoded entities. I have no hard data on this but the behaviour makes sense. Related question here: Unicode characters in URLs
Your page is using windows-1252
or some other single-byte character set as its output encoding, which is why you need to convert the character data first.
You could change your page's output encoding to UTF-8 to save yourself that step, but that may have other consequences (like the need to use multi-byte string functions and/or a different encoding for database output, etc.)
windows-1251 is an 8-bit character encoding designed to cover languages that use Cyrillic alphabets. Wiki
You might have set the charset to windows-1251 in your webpage
I also met this problem. I use adobe dreameweaver cs4
(non english version)
I solve it as below:
add
header('Content-type: text/html; charset=utf-8');
at the top of the PHP page file.IMPORTANT In
adobe dreameweaver
, you should modifyPage Properties
from thetop menu
Modify (M) -> Page Properties (P)
, chooseTitle/coding
and modifyunicode
tounicode (uft-8)
handly.
(sorry, these menu words are translated to english, maybe not the real words)
精彩评论