开发者

import/update large xml file into MySQL using PHP

I have about 30K records in an XML file and this file is updated all the time.

I'm trying to insert and if exists update a MySQL db.

This is the code I'd like to use, but it runs very slowly, does anyone have any ideas for improving its performance开发者_运维技巧?

// getting xml file
$dom = new DOMDocument();
$dom->load('products.xml');

// getting xml nodes using xpath
$xpath = new DOMXPath($dom);
$productid = $xpath->query('//NewDataSet/Product/ProductId');
$price = $xpath->query('//NewDataSet/Product/Price');

// Reading all nodes and if mach found in db update price, else insert as new record**
for($i=0;$i<$allNodes->length;$i++){
    $testproductid = $productid->item($i)->nodeValue;
    $testprice = $price->item($i)->nodeValue;
    if(mysql_num_rows(mysql_query("Select productid from test where productid ='$testproductid'"))){
        mysql_query("UPDATE test SET price = '$testprice' WHERE productid = '$testproductid'");
    }else{
        mysql_query("INSERT INTO test (price, productid) VALUES ('$testprice','$testproductid')");
    }
}


First, this line can result in bad behaviors:

if(mysql_num_rows(mysql_query("Select productid from test where productid ='$testproductid'")))

What happens if mysql_query() fails? Do something like that instead:

$res = mysql_query("Select productid from test where productid ='$testproductid'");
if ($res) {
... CODE HERE ...
}

Is productid an index? Also, you can formulate your query as:

Select productid from test where productid ='$testproductid' LIMIT 1

In this case, MySQL won't look for more records. Also, try to insert more than one record in on single INSERT statement. See this:

http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

Take a look at the REPLACE command. That would replace the SELECT/UPDATE/INSERT conditions, but it might not be a great improvement of performance though.

http://dev.mysql.com/doc/refman/5.0/en/replace.html


First off I recommend brushing up on some MySQL. Second off, by using primary key on your productid field, you can use a more advanced sql statement called: insert ... on duplicate key update ...

It's gonna halve your database lookups for the first part, since you're doing one extra test before inserting/updating.

Second off, XML might not be the best solution for your cross-platform file. Any particular reason you are using this?


30k update statements in one transaction should complete in reasonable time (for waiting user). Perhaps autocommit is on?

Also, if you don't mind being mysql-specific there is REPLACE which does INSERT/UPDATE in one statement. Or you can do INSERT ... ON DUPLICATE KEY UPDATE. In particular, this does away with "if(mysql_num_rows(mysql_query("Select productid from test where productid ='$testproductid'")))".


Why do two queries where one will suffice?

$sql = "INSERT INTO test (price, productid) " .
       "VALUES ('$testprice','$testproductid') " .
       "ON DUPLICATE KEY UPDATE";

if(!$query = mysql_query($sql))
   trigger_error(mysql_error());

You could also try SimpleXML in place of DOMDocument, but from what I can Google there doesn't seem to be any documented speed difference.


Also, if you don't mind being mysql-specific there is REPLACE which does INSERT/UPDATE in one statement. Or you can do INSERT ... ON DUPLICATE KEY UPDATE. In particular, this does away with if(mysql_num_rows(mysql_query("Select productid from test where productid ='$testproductid'"))).

30k update statements in one transaction should complete in reasonable time (for waiting user). Perhaps autocommit is on?


Script to load a large files by chunks It will load the xml file, read a given number of entries in one time and then load them to the database..

$lot =5000;
$tempFiledir = '.';
$tempFile = 'temp.xml';
$table = 'mytable';
$db_username= 'root';
$db_password = 'mysql';

// count element 
    print( "    Computing items...");
    $xml_reader = new XMLReader;
    $xml_reader->open($xml_file);
    while ($xml_reader->read() && $xml_reader->name != $node_name);
    $totalItems =0;
    while ($xml_reader->name == $node_name) {
        $xml_reader->next($node_name);
        $totalItems++;
    }
    $xml_reader->close();

    print( "\r    $totalItems items found.                     ");


//Truncat the table to load into 
$xmlload_cmd = sprintf ("$mysql_exe -u%s -p%s $database_temp -e \"TRUNCATE TABLE `%s`;\" ", $db_username, $db_password, $table);
system($xmlload_cmd);                           

// move the pointer to the first item
$xml_reader = new XMLReader;
$xml_reader->open($xml_file);
while ($xml_reader->read() && $xml_reader->name != $node_name);


// load by chunks
$index = 0;
while ($xml_reader->name == $node_name){

    $tempFileXMLOutput = fopen( "$tempFiledir\\$tempFile", "w") or die("Unable to open file!");
    fwrite($tempFileXMLOutput,'<?xml version="1.0"?>');

    $index0=$index;
    do {    
        // remove self closign tags from the rendred xml output and store it in the temp file
        $data = preg_replace('/\<(\w+)\s*\/\s*\>/i', '<$1></$1>', $xml_reader->readOuterXML());
        fwrite($tempFileXMLOutput, "\n\t$data");    

        // move the pointer to the next item
        $xml_reader->next($node_name);
        $index++;
    }
    while ($xml_reader->name == $node_name && ($index % $lot != 0) );

    // close the temp file
    fclose($tempFileXMLOutput);

    echo sprintf("\r    Processing items from %6s to %6s [%3.0f%%]", $index0, $index, $index/$totalItems*100);

    // run the LOAD XML comand on the temp xml file
    $load_cmd = sprintf("LOAD XML LOCAL INFILE '%s' INTO TABLE `%s` ROWS IDENTIFIED BY '<Data>'", addslashes("$tempFiledir\\$tempFile"), $table);               

    $xmlload_cmd = sprintf ("$mysql_exe -u%s -p%s $database_temp -e \"$load_cmd\" ", $db_username, $db_password);
    system($xmlload_cmd);   

    // remove the temp file
    @unlink ( "$tempFiledir\\$tempFile");
}

$xml_reader->close();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜