What's the fastest way to poll a MySQL table for new rows?
My application needs to poll a MySQL database for new rows. Every time new rows are added, they should be retrieved. I was thinking of creating a trigger to place references to new rows on a separate table. The original table has over 300,000 开发者_StackOverflow社区rows.
The application is built in PHP.
Some good answers, i think the question deserves a bounty.
For external applications I find using a TimeStamp column is a more robust method that is independent of auto id and other primary key issues
Add columns to the tables such as:
insertedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP
or to track inserts and updates
updatedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
In the external application all you need to do is track the last timestamp when you did a poll. Then select from that timestamp forward on all the relevant tables. In large tables you may need to index the timestamp column
You can use the following statement to find out if a new record was inserted in the table:
select max(id) from table_name
replacing the name of primary key and table name in the above statement. Keep the max(id) value in a temporary variable, and retrieve all new records between this and the last saved max(id) value. After fetching the new records, set max(id) value to the one you got from the query.
Create a PHP Daemon to monitor the MySQL Table File size, if size changes query for new records, if new records found run next process.
I think there is an active PEAR daemon you can easily configure to monitor the MySQL Table file size and kick off your script.
assuming you have an identify or some other data that always grow, you should keep track on your php application of the last id retrieved.
that'd work for most scenarios. Unless you are into the real time camp, I don't think you'd need any more than that.
I would do something like this. Of course, this is assuming that ID is an incrementing numerical ID. And how you store your "current location" in the database is upto you.
<?
$idFile = 'lastID.dat';
if(is_file($idFile)){
$lastSelectedId = (int)file_get_contents($idFile);
} else {
$lastSelectedId = 0;
}
$res = mysql_query("select * from table_name where id > {$lastSelectedId}");
while($row = mysql_fetch_assoc($res)){
// Do something with the new rows
if($row['id']>$lastSelectedId){
$lastSelectedId = $row['id'];
}
}
file_put_contents($idFile,$lastSelectedId);
?>
I would concurr with TFD's answer about keeping track of a timestamp in an separate file/table and then fetching all rows newer than that. That's how I do it for a similar application.
Your application querying a single row table (or file) to see if a timestamp has changed from the local storage should not be much of a performance hit. Then, fetching new rows from the 300k row table based on timestamp should again be fine, assuming timestamp is properly indexed.
However, reading your question I was curious if Mysql triggers can do system calls, say a php script that would do some heavy lifting. Turns out they can by using the sys_exec() User-Defined Function. You could use this to do all sorts of processing by passing into it the inserted row data, essentially having an instant notification of inserts.
Finally, a word of caution about using triggers to call external applications.
One option might be to use an INSERT INTO SELECT statement. Taking from the suggestions using timestamps to pull the latest rows, you could do something like...
INSERT INTO t2 (
SELECT *
FROM t1
WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);
This would take all of the rows inserted in the previous hour and insert them in to table 2. You could have a script run this query and have it run every hour (or whatever interval you need).
This would drastically simplify your PHP script for pulling rows as you wouldn't need to iterate over any rows. It also gets rid of having to keep track of the last insert id.
The solution Fanis purposed also sounds like it could be interesting as well.
As a note, the select query in the above insert can but adjusted to only insert certain fields. If you only need certain fields, you would need to specify them in the insert like so...
INSERT INTO t2 (field1, field2) (
SELECT field1, field2
FROM t1
WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);
精彩评论