开发者

PHP htmlspecialchars() function error when trying to use UTF-8 string

I did the following things:

  1. I have a spreadsheet with data. One of the rows has a ü character in it.
  2. I save this as a CSV file in OpenOffice.org. When it asks me for a character encoding, I choose UTF-8.
  3. I use Navicat to create a MySQL database table, In开发者_运维知识库noDB with UTF-8 utf8_general encoding and import the CSV.
  4. I try to use PHP function htmlspecialchars($string, ENT_COMPAT, 'UTF-8') where $string is the string containing the special ü character.

It gives me an error: Invalid multibyte sequence in argument. When I change 'UTF-8' with 'ISO8859-1', no error is thrown, but the incorrect character is shown. (The 'unknown character' character, looks like <?>)

If I use an HTML form to update the string in the database, the error disappears and the character is displayed correctly, however, when I then look at the record in Navicat, it looks two characters:

[1/4][A with some thing on top of it]

Some multibyte that isn't seen as one character.`

What is going on, where are things going wrong, and what can I do about it?


Although I don't understand where the "invalid multibyte" error comes from, I'm pretty sure htmlspecialchars() is not your culprit:

For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets.

In my understanding, htmlspecialchars() should work fine for a UTF-8 string without specifying a character set. My bet would be that either the HTML page containing the form, or the database connection you use is not UTF-8 encoded. For the latter, try sending a

SET NAMES utf8;

to mySQL before doing the insert.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜