开发者

problems with UTF-8 encoding in PHP

The characters I am getting from the URL, for example www.mydomain.com/?name=john , were fine, as longs as they were not in Russian.

If they were are in Russian, I was getting '����'.

So I added $name= iconv("cp1251","utf-8" ,$name); and now it works fine for Russian and English characters, but screws up other languages. :)))

For example 'Jānis' ( Latvian ) that worked fine before iconv, now turns into 'jДЃnis'.

Any idea if there's some universal encoder that would work with both the Cyrillic lan开发者_JAVA百科guages and not screw up other languages?


Why don't you just use UTF-8 with all files and processes?


Actually this runs down to the problem of how the URL is encoded. If you're clicking a link on a given page the browser will use the page's encoding to sent the request but if you enter the URL directly into the address-bar of your browser the behavior is somehow undefined as there is no standardized way on the encoding to use (Firefox provides an about:config switch to use UTF-8 encoded URLs).

Besides using some encoding detection there is no way to know the encoding used with the URL in the given request.

EDIT:

Just to backup what I said above, I wrote a small test script that shows the default behavior of the five major browsers (running Mac OS X in my case - Windows Vista via Parallels in case of the IE):

$p = $_GET['p'];
for ($i = 0; $i < strlen($p); $i++) {
    // this displays the binary data received via the URL in hex format
    echo dechex(ord($p[$i])) . ' ';
}

Calling http://path/to/script.php?p=äöü leads to

  • Safari (4.0.5): c3 a4 c3 b6 c3 bc
  • Firefox (3.6.3): c3 a4 c3 b6 c3 bc
  • Google Chrome (5.0.375.38): c3 a4 c3 b6 c3 bc
  • Opera (10.10): e4 f6 fc
  • Internet Explorer (8.0.6001.18904): e4 f6 fc

So obviously the first three use UTF-8 encoded URLs while Opera and IE use ISO-8859-1 or some of its variants. Conclusion: you cannot be sure what's the encoding of textual data sent via an URL.


Seems like the issue is the file encoding, you should always use UTF-8 no BOM as the prefered encoding for your .php files, code editors such as Intype let you easily specify this (UTF-8 Plain).

problems with UTF-8 encoding in PHP

Also, add the following code to your files before any output:

header('Content-Type: text/html; charset=utf-8');

You should also read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜