开发者

Problems with characters like ÖÄÅ

My form

<form action="saveProfile.php" method="post" name="ProfileUpdate" id="ProfileUpdate" >
<input name="Smeknamn" id="Smeknamn" type="text" value="<?php echo $v["user_name"]; ?>" maxlength="16" id="ctl00_ctl00_cphContent_cphContent_cphContentLeft_tbUsername" onkeydown="return ((event.keyCode != 16) || (event.keyCode == 16 &amp;&amp; this.value.length >= 1));" style="width: 130px;" />
</form>

When I try to echo $_POST["Smeknamn"]; on saveProfile.php i get Ã�Ã�Ã� on the characters Ö Ä Å

Why is this happening? saveProfile AND editProfile is encoded in UTF-8 without BOM, and meta utf8 and all that.

UPDATE UPDATE

$smeknamn = $data["Smeknam开发者_Go百科n"]

Sorry forgot to mention that i had this foreach. And its $smeknamn im echoing and getting Ã�Ã�Ã�. I just tried $_POST["Smeknamn"] and it echo out ÖÄÅ just fine.. So the problem is now in the foreach() that makes the öäå chars Ã�Ã�Ã�. How can i fix this?

foreach($_POST as $key => $value) {
    $data[$key] = filter($value);
}
function filter($data) {
    $data = trim(htmlentities(strip_tags($data)));

    if (get_magic_quotes_gpc())
        $data = stripslashes($data);

    $data = mysql_real_escape_string($data);

    return $data;
}


Try encoding editProfile.php and saveProfile.php as UTF-8 with BOM.


This is a character encoding issue.

I guess your data is actually encoded with UTF-8 so the character Ö (U+00D6) is encoded with 0xC396. Now when htmlentities is called without specifying the charset parameter, it implicitly uses ISO 8859-1:

[…] optional third argument charset which defines character set used in conversion. Presently, the ISO-8859-1 character set is used as the default.

And when interpreting the byte sequence 0xC396 with ISO 8859-1 it represents the two ISO 8859-1 characters 0xC3 and 0x96. Since there is the entity Atilde for the ISO 8859-1 character 0xC3, this character is replaced by htmlentities with the reference &Atilde;. But there isn’t any entity representing the second character 0x96, so it’s not being replaced. That means:

htmlentities("\xC3\x96") === "&Atilde;\x96"

Now when this is interpreted by the user agent, the character reference gets displayed correctly but the remaining byte 0x96 is not a valid byte sequence for a character in UTF-8. That’s why the replacement character is displayed instead.

So the problem is that you didn’t specify the correct character encoding for htmlentities:

htmlentities("\xC3\x96", ENT_COMPAT, "UTF-8") === "&Ouml;"

But as you’re already using UTF-8 for your output, you don’t need to replace such characters and using htmlspecialchars instead will suffice to replace the HTML special characters.

But besides that, you shouldn’t use such an universal-like filter function as every language and context has its own special character that need to be taken care of.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜