Combine many MySQL queries with logic into data file
Background:
I am parsing a 330 meg xml file into a DB (netflix catalog) using PHP script from the console.
I can successfully add about 1,500 titles every 3 seconds until i addd the logic to add actors, genre and formats. These are separate tables linked by an associative table.
right now I have to run many, many queries for each title, in this order ( i truncate all tables first, to eliminate old titles, genres, etc)
- add new title to 'titles' and capture insert id
- check actor table for exising actor
- if present, get id, if not insert actor and get insert id
- insert title id and actor id into associative table
(steps 2-4 are repeated for genres too)
This drops my speed don to about 10 per 3 seconds. which would take eternitty to add the ~250,00 titles.
so how would I combine the 4 queries into a single query, without adding duplicate actors or genres
My goal is to just write all queries into a data file, and do a bulk insert.
I started by writing all associative queries into a data file, but it didn't do much for performance.
I start by inserting th etitle, and saving ID
function insertTitle($nfid, $title, $year){
$query="INSERT INTO ".$this->titles_table." (nf_id, title, year ) VALUES ('$nfid','$title','$year')";
mysql_query($query);
$this->updatedTitleCount++;
return mysql_insert_id();
}
that is then used in conjunction with each actor's name to create the association
function linkActor($value, $title_id){
//check if 开发者_如何转开发we already know value
$query="SELECT * FROM ".$this->persons_table." WHERE person = '$value' LIMIT 0,1";
//echo "<br>".$query."<br>";
$result=mysql_query($query);
if($result && mysql_num_rows($result) != 0){
while ($row = mysql_fetch_assoc($result)) {
$value_id=$row['id'];
}
}else{
//no value known, add to persons table
$query="INSERT INTO ".$this->persons_table." (person) VALUES ('$value')";
mysql_query($query);
$value_id=mysql_insert_id();
}
//echo "linking title:".$title_id." with rel:".$value_id;
$query = " INSERT INTO ".$this->title_persons_table." (title_id,person_id) VALUE ('$title_id','$value_id');";
//mysql_query($query);
//write query to data file to be read in bulk style
fwrite($this->fh, $query);
}
This is a perfect opportunity for using prepared statements.
Also take a look at the tips at http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html, e.g.
To speed up INSERT operations that are performed with multiple statements for nontransactional tables, lock your tables
You can also decrease the number of queries. E.g. you can eliminate the SELECT...FROM persons_table
to obtain the id by using INSERT...ON DUPLICATE KEY UPDATE
and LAST_INSERT_ID(expr).
( sorry, running out of time for a lengthy description, but I wrote an example before noticing the time ;-) If this answer isn't downvoted too much I can hand it in later. )
class Foo {
protected $persons_table='personsTemp';
protected $pdo;
protected $stmts = array();
public function __construct($pdo) {
$this->pdo = $pdo;
$this->stmts['InsertPersons'] = $pdo->prepare('
INSERT INTO
'.$this->persons_table.'
(person)
VALUES
(:person)
ON DUPLICATE KEY UPDATE
id=LAST_INSERT_ID(id)
');
}
public function getActorId($name) {
$this->stmts['InsertPersons']->execute(array(':person'=>$name));
return $this->pdo->lastInsertId('id');
}
}
$pdo = new PDO("mysql:host=localhost;dbname=test", 'localonly', 'localonly');
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// create a temporary/test table
$pdo->exec('CREATE TEMPORARY TABLE personsTemp (id int auto_increment, person varchar(32), primary key(id), unique key idxPerson(person))');
// and fill in some data
foreach(range('A', 'D') as $p) {
$pdo->exec("INSERT INTO personsTemp (person) VALUES ('Person $p')");
}
$foo = new Foo($pdo);
foreach( array('Person A', 'Person C', 'Person Z', 'Person B', 'Person Y', 'Person A', 'Person Z', 'Person A') as $name) {
echo $name, ' -> ', $foo->getActorId($name), "\n";
}
prints
Person A -> 1
Person C -> 3
Person Z -> 5
Person B -> 2
Person Y -> 6
Person A -> 1
Person Z -> 5
Person A -> 1
(someone might want to start a discussion whether a getXYZ() function should perform an INSERT or not ...but not me, not now....)
Your performance is glacially slow; something is very Wrong. I assume the following
- You run your dedicated, otherwise-idle database server on respectable hardware
- You have tuned it to some extent (i.e. at least configure it to use a few gigs of ram properly) - engine-specific optimisations will be required
You may be being stung by doing lots of tiny operations with autocommit on; this is a mistake as it generates an unreasonable number of disc IO operations. You should do a large amount of work (100, 1000 records etc) in a single transaction then commit it.
The lookups may be slowing things down because of the simple overhead of doing the queries (the queries themselves will be really easy as you'll have an index on actor name).
I also question your method of assuming that no two actors have the same name - surely your original database contains a unique actor ID, so you don't get them mixed up?
Can you use a language other than PHP? If not, are you running this as a PHP stand-alone script or through a webserver? The webserver is probably adding a lot of overhead you don't need.
I do something very similar at work, using Python, and can insert a couple thousand rows (with associative table lookups) per second on your standard 3.4 GHz, 3GB RAM, machine. MySQL database isn't hosted locally but within the LAN.
精彩评论