开发者

MySQL - How do I efficiently get the row with the lowest ID?

Is there a faster way to update the oldest row of a MySQL table that matches a certain condition than using ORDER BY id LIMIT 1 as in the following query?

UPDATE mytable SET field1 = '1' WHERE field1 = 0 ORDER BY id LIMIT 1;

Note:

  • Assume the primary key is id and there is also a index on field1.
  • We are updating a single row.
  • We are not updating strictly the oldest row, we are updating the oldest row that matches a condition.
  • We want to update the oldest matching row, i.e the lowest id, i.e. the head of the FIFO queue.

Questions:

  • Is the ORDER BY id necessary? How does MySQL order by default?

Real world example

We have a DB table being used for a email queue. Rows are added when we want to queue emails to send to our users. Rows are removed by a cron job, run each minute, processing as many as possible in that minute and sending 1 email per row.

We plan to ditch this approach and use something like Gearman or Resque to process our email queue. But in the meantime I have a question on how we can efficiently mark the oldest item of the queue for processing, a.k.a. The row with the lowest ID. This query does the job:

mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 ORDER BY id LIMIT 1");

However, it is appearing in the mysql slow log a lot due to scaling issues. The query can take more than 10s when the table has 500,开发者_运维技巧000 rows. The problem is that this table has grown massively since it was first introduced and now sometimes has half a million rows and a overhead of 133.9 MiB. For example we INSERT 6000 new rows perhaps 180 times a day and DELETE roughly the same number.

To stop the query appearing in the slow log we removed the ORDER BY id to stop a massive sort of the whole table. i.e.

mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 LIMIT 1");

... but the new query no longer always gets the row with the lowest id (although it often does). Is there a more efficient way of getting the row with the lowest id other than using ORDER BY id ?

For reference, this is the structure of the email queue table:

CREATE TABLE IF NOT EXISTS `email_queue` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `time_queued` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Time when item was queued',
  `mem_id` int(10) NOT NULL,
  `email` varchar(150) NOT NULL,
  `processingID` int(2) NOT NULL COMMENT 'Indicate if row is being processed',
  PRIMARY KEY (`id`),
  KEY `processingID` (`processingID`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;


Give this a read:

  • ORDER BY … LIMIT Performance Optimization


sounds like you have other processes locking the table preventing your update completing in a timely manner - have you considered using innodb ?


I think the 'slow part' comes from

WHERE processingID = 0 

It's slow because it's not indexed. But, indexing this column (IMHO) seems incorrect too. The idea is to change above query to something like :

WHERE id = 0 

Which theoretically will be faster since it uses index.

How about creating another table which contains ids of rows which hasn't been processed? Hence the insertion works twice. First to insert to the real table and the second is to insert id into 'table of hasn't processed'. The processing part too, needs to double its duty. First to retrieve an id from 'table of hasn't been processed' then delete it. The second job of processing part is to process of course.

Of course, the id column in 'table of hasn't been processed' needs to index its content. Just to ensure that selecting and deleting will be faster.


This question is old, but for reference for anyone ending up here:

You have a condition on processingID (WHERE processingID = 0), and within that constraint you want to order by ID.

What's happening with your current query is that it scans the table from the lowest ID to the greatest, stopping when it finds 1 record matching the condition. Presumably, it will first find a ton of old records, scanning almost the entire table until it finds an unprocessed one near the end.

How do we improve this?

Consider that you have an index on processingID. Technically, the primary key is always appended (which is how the index can "point" to anything in the first place). So you really have an index on processingID, id. That means ordering on that will be fast.

Change your ordering to: ORDER BY processingID, id

Since you have fixed processingID to a single value with you WHERE clause, this does not change the resulting order. However, it does make it easy for the database to apply both your condition and your ordering, without scanning any records that do not match.


One funny thing is that MySQL, by default, returns rows orderd by ID, instead in a casual way as stated in the relational theory (I am not sure if this behaviour is changed in the latest versions). So, the last row you get from a select should be the last inserted row. I would not use this way, of course.

As you said, the best solution is to use something like Resque, or RabbitMQ & co.

You could use an in-memory table, that is volatile, but much faster, than store, there the latest ID, or just use a my_isam table to add persistency. It is simple and fast in performance and it takes a little bit to implement.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜