开发者

Using LOAD DATA INFILE with arabic data

I am trying to import a .csv file into a table. I have figured out how to get the data inserted by using the following query:

LOAD DATA INFILE 'examplesofdata.csv' INTO TABLE coins FIELDS TERMINATED BY ',' 
ENCLOSED BY '' ESCAPED BY '\\'  IGNORE 1 LINES;

However for several of my fields I have Arabic content which gets entered as a series of ? I assume this is because I haven't collated the database correctly or I don't fully understand the LOAD DATA INFILE query. Any advice would be greatly appreciated.

The SHOW CREATE TABLE coins; output is:

CREATE TABLE `coins` (
  `cat_num` int(11) NOT NULL,
  `reg_num` int(11) NOT NULL,
  `period` varchar(255) NOT NULL,
  `arb_period` varchar(255) character set utf8 collate utf8_unicode_c开发者_Go百科i NOT NULL,
  `ruler` varchar(255) NOT NULL,
  `arb_ruler` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
  `mint` varchar(255) NOT NULL,
  `arb_mint` varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL,
  `date` varchar(255) NOT NULL,
  `weight` float NOT NULL,
  `diameter` float NOT NULL,
  `khedieval_num` varchar(255) NOT NULL,
  `ref` text NOT NULL,
 PRIMARY KEY  (`cat_num`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8


LOAD DATA LOCAL INFILE 'filename' INTO TABLE tablename CHARACTER SET utf8 COLUMNS TERMINATED BY '\t' LINES TERMINATED BY '\n';

the CHARACTER SET utf8 does the trick.


This is still a bug with MySQL. However, I found out that the database's default charset is the culprit. There are two possible workarounds:

  1. If you change your database's default charset to LATIN1 then it will work. You can keep your tables/columns UTF-8.
  2. Strangely, if you use the "CHARACTER SET latin1" it will work for both UTF-8 and Latin1 tables/columns. With this method, you can keep your db/table/column charset on UTF-8.


So I ended up getting an answer from an old instructor for my Databases class. He told me that this problem is actually a reported bug with the current version of MySQL and that the only known solution at the time is to manually import the data through PHP or another scripting language.

The bug for this issue is at: http://bugs.mysql.com/bug.php?id=10195

It didn't help me too much since I was only working on a prototype, and managed a workaround in the mean time, but hopefully it can be of more use to you.


How about setting CHARACTER SET utf8_unicode or to your locale?


I also found out that your character_set_client needs to be UTF-8 as well:

mysql> show session variables like 'char%';
+--------------------------+----------------------------------------+
| Variable_name            | Value                                  |
+--------------------------+----------------------------------------+
| character_set_client     | latin1                        
...

Read mysql docs on how to go about changing that for the who server or just the session only.


I also had that issue, but instead of series of ?, I was getting truncated data.

Like "aeióu" was being truncated in "aei".

Check the solution I came up with here, you need to match the CSV charset with the LOAD DATA INFILE charset.

Cheers


Adding CHARACTER SET utf8 to the LOAD DATA statement is the proximate answer. However, two other issues have been brought up...

When trying to use utf8/utf8mb4, if you see Question Marks (regular ones, not black diamonds),

  • The bytes to be stored are not encoded as utf8. Fix this.
  • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
  • Also, check that the connection during reading is utf8.

When trying to use utf8/utf8mb4, if you see Truncated text,

  • The bytes to be stored are not encoded as utf8. Fix this.
  • Also, check that the connection during reading is utf8.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜