开发者

Importing large CSV into mysql database

I'm having a really troublesome time trying to import a large CSV file into mysql on localhost.

The CSV is about 55 MB and has about 750,000 rows.

I've rewritten the script so that it parses the CSV and dumps the rows one by one.

Here's the code:

$row = 1;
if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) 
{
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
    {
        $num = count($data);
        $row++;
        for ($c=0; $c < $num; $c++) 
        {
            $arr = explode('|', $data[$c]);

            $postcode = mysql_real_escape_string($arr[1]);
            $city_name = mysql_real_escape_string($a开发者_运维百科rr[2]);
            $city_slug = mysql_real_escape_string(toAscii($city_name));
            $prov_name = mysql_real_escape_string($arr[3]);
            $prov_slug = mysql_real_escape_string(toAscii($prov_name));
            $prov_abbr = mysql_real_escape_string($arr[4]);
            $lat = mysql_real_escape_string($arr[6]);
            $lng = mysql_real_escape_string($arr[7]);

            mysql_query("insert into cities (`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) 
                         values ('$postcode', '$city_name', '$city_slug', '$prov_name', '$prov_slug', '$prov_abbr', '$lat', '$lng')") or die(mysql_error());
        }
    }
    fclose($handle);
}

The problem is that it's taking forever to execute. Any suuggested solutions would be appreciated.


You are reinventing the wheel. Check out the mysqlimport tool, which comes with MySQL. It is an efficient tool for importing CSV data files.

mysqlimport is a command-line interface for the LOAD DATA LOCAL INFILE SQL statement.

Either should run 10-20x faster than doing INSERT row by row.


Your problem is likely that you have autocommit on (by default) so MySQL is committing a new transaction for each insert. You should turn autocommit off with SET autocommit=0;. If you can switch to using the mysqli library (and you should if possible), you can use mysqli::autocommit(false) to turn off autocommitting.

$mysqli = new mysqli('localhost','db_user','my_password','mysql');
$mysqli->autocommit(false);
$stmt=$mysqli->prepare("insert into cities (`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) 
                     values (?, ?, ?, ?, ?, ?, ?, ?);")


$row = 1;
if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) 
{
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
    {
        $num = count($data);
        $row++;
        for ($c=0; $c < $num; $c++) 
        {
            $arr = explode('|', $data[$c]);
            $stmt->bind_param('ssssssdd', $arr[1], $arr[2], toAscii(arr[2]), $arr[3], toAscii($arr[3]), $arr[4], $arr[6], $arr[7]);
            $stmt->execute();
        }
    }
}
$mysqli->commit();
fclose($handle);


It will be much faster to use LOAD DATA if you can


try to do it in one query.

It could be limited by your my.cnf (mysql configuration) though

<?php

$row = 1;
$query = ("insert into cities ");
if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) 
{
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
    {
        $num = count($data);
        $row++;
        for ($c=0; $c < $num; $c++) 
        {
            $arr = explode('|', $data[$c]);

            $postcode = mysql_real_escape_string($arr[1]);
            $city_name = mysql_real_escape_string($arr[2]);
            $city_slug = mysql_real_escape_string(toAscii($city_name));
            $prov_name = mysql_real_escape_string($arr[3]);
            $prov_slug = mysql_real_escape_string(toAscii($prov_name));
            $prov_abbr = mysql_real_escape_string($arr[4]);
            $lat = mysql_real_escape_string($arr[6]);
            $lng = mysql_real_escape_string($arr[7]);
            $query .= "(`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) 
                         values ('$postcode', '$city_name', '$city_slug', '$prov_name', '$prov_slug', '$prov_abbr', '$lat', '$lng'),";

        }
    }
    fclose($handle);
}
mysql_query(rtrim($query, ","));

if it won't work, you can try this (disable automatical commit)

mysql_query("SET autocommit = 0");
$row = 1;
if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) 
{
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
    {
        $num = count($data);
        $row++;
        for ($c=0; $c < $num; $c++) 
        {
            $arr = explode('|', $data[$c]);

            $postcode = mysql_real_escape_string($arr[1]);
            $city_name = mysql_real_escape_string($arr[2]);
            $city_slug = mysql_real_escape_string(toAscii($city_name));
            $prov_name = mysql_real_escape_string($arr[3]);
            $prov_slug = mysql_real_escape_string(toAscii($prov_name));
            $prov_abbr = mysql_real_escape_string($arr[4]);
            $lat = mysql_real_escape_string($arr[6]);
            $lng = mysql_real_escape_string($arr[7]);

            mysql_query("insert into cities (`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) 
                         values ('$postcode', '$city_name', '$city_slug', '$prov_name', '$prov_slug', '$prov_abbr', '$lat', '$lng')") or die(mysql_error());
        }
    }
    fclose($handle);
}


I did this with SQL server:

  • I used SQL Bulkinsert command combined with data tables.
  • Data Tables reside in memory and are built from reading rows inside the file.
  • Each data table is built from a chunk of rows, not the entire file.
  • Keep track from the chunk processed by keeping pointers from last row read and max size of chunk.
  • When you are reading the file. exit the loop when the row id > last row + chunk size.
  • Keeping on looping and keep on inserting.


Also sometimes when you are using Load data if there are warnings the import will stop. You can use the keyword ignore.

LOAD DATA INFILE 'file Path' IGNORE INTO TABLE YOUR_Table


I had a similar situation where is was NOT feasible to use LOAD DATA. Transactions were at times unacceptable as well, as data needed to be checked for duplicates. Yet, the following drastically improved the process time for some of my import data files.

Before your while loop (CSV Lines) set autocommit to 0 and start a transaction (InnoDB only):

mysql_query('SET autocommit=0;');
mysql_query('START TRANSACTION;');

After your loop, commit and reset autocommit back to 1 (default):

mysql_query('COMMIT;');
mysql_query('SET autocommit=1;');

Replace mysql_query() with whatever Database object your code is using. I hope this helps others.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜