开发者

Accents in uploaded file being replaced with '?'

I am building a data import tool for the admin section of a website I am working on. The data is in both French and English, and contains many accented characters. Whenever I attempt to upload a file, parse the data, and store it in my MySQL database, the accents are replaced with '?'.

I have text files containing data (charset is iso-8859-1) which I upload to my server using CodeIgniter's file upload library. I then read the file in PHP.

My code is similar to this:

$this->upload->do_upload()
$data = array('upload_data' => $this->upload->data());

$fileHandle = fopen($data['upload_data']['full_path'], "r");

while (($line = fgets($fileHandle)) !== false) {
    echo $line;
}

This produces lines with accents replaced with '?'. Everything else is correct.

If I download my uploaded file from my server over FTP, the charset is still iso-8850-1, but a diff reveals that the file has changed. However, if I open the file in TextEdit, it displays properly.

I attempted to use PHP's stream_encoding method to explicitly set my file 开发者_开发问答stream to iso-8859-1, but my build of PHP does not have the method.

After running out of ideas, I tried wrapping my strings in both utf8_encode and utf8_decode. Neither worked.

If anyone has any suggestions about things I could try, I would be extremely grateful.


It's Important to see if the corruption is happening before or after the query is being issued to mySQL. There are too many possible things happening here to be able to pinpoint it. Are you able to output your MySql to check this?

Assuming that your query IS properly formed (no corruption at the stage the query is being outputted) there are a couple of things that you should check.

  1. What is the character encoding of the database itself? (collation)

  2. What is the Charset of the connection - this may not be set up correctly in your mysql config and can be manually set using the 'SET NAMES' command

In my own application I issue a 'SET NAMES utf8' as my first query after establishing a connection as I am unable to change the MySQL config.

See this. http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

Edit: If the issue is not related to mysql I'd check the following

  1. You say the encoding of the file is 'charset is iso-8859-1' - can I ask how you are sure of this?

  2. What happens if you save the file itself as utf8 (Without BOM) and try to reprocess it?

  3. What is the encoding of the php file that is performing the conversion? (What are you using to write your php - it may be 'managing' this for you in an undesired way)

  4. (an aside) Are the files you are processing suitable for processing using fgetcsv instead? http://php.net/manual/en/function.fgetcsv.php


Files uploaded to your server should be returned the same on download. That means, the encoding of the file (which is just a bunch of binary data) should not be changed. Instead you should take care that you are able to store the binary information of that file unchanged.

To achieve that with your database, create a BLOB field. That's the right column type for it. It's just binary data.

Assuming you're using MySQL, this is the reference: The BLOB and TEXT Types, look out for BLOB.


The problem is that you are using iso-8859-1 instead of utf-8. In order to encode it in the correct charset, you should use the iconv function, like so:

$output_string = iconv('utf-8", "utf-8//TRANSLIT", $input_string);

iso-8859-1 does not have the encoding for any sort of accents.

It would be so much better if everything were utf-8, as it handles virtually every character known to man.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜