开发者

Optimising sql query performing internal comparison

I have the following query which is a little expensive (currently 500ms):

SELECT * FROM events AS e, event_dates AS ed 
WHERE e.id=ed.event_id AND ed.start >= DATE(NOW()) 
GROUP BY e.modified_datetime, e.id 
ORDER BY e.modified_datetime DESC,e.created_datetime DESC 
LIMIT 0,4

I have been trying to figure our how to speed it up and noticed that changing ed.start &g开发者_开发百科t;= DATE(NOW()) to ed.start = DATE(NOW()) runs the query in 20ms. Can anyone help me with ways to speed up this date comparison? Would it help to calculate DATE(NOW()) before running the query??

EDIT: does this help, using EXPLAIN statement

BEFORE
table=event_dates
type=range
rows=25962
ref=null
extra=using where; Using temporary; Using filesort

AFTER
table=event_dates
type=ref
rows=211
ref=const
extra=Using temporary; Using filesort


SELECT * FROM events AS e
INNER JOIN event_dates AS ed ON (e.id=ed.event_id)
WHERE ed.start >= DATE(NOW()) 
GROUP BY e.modified_datetime, e.id 
ORDER BY e.modified_datetime DESC,e.created_datetime DESC 
LIMIT 0,4

Remarks

  1. Please don't using implicit SQL '89 syntax, it is an SQL anti-pattern.
  2. Make sure you have an index on all fields used in the join, in the where, in the group by and the order by clauses.
  3. Don't do select * (another anti-pattern), explicitly state the fields you need instead.
  4. Try using InnoDB instead of MyISAM, InnoDB has more optimization tricks for select statements, especially if you only select indexed fields.
  5. For MyISAM tables try using REPAIR TABLE tablename.
  6. For InnoDB that's not an option, but forcing the tabletype from MyISAM to InnoDB will obviously force a full rebuild of the table and all indexes.
  7. Group by implicitly sorts the rows in ASC order, try changing the group by to group by -e.modified_datetime, e.id to minimize the reordering needed by the order by clause. (not sure about this point, would like to know the result)


For reference, using , notation for joins is poor practice AND has been a cause for poor execution plans.

SELECT
  *
FROM
  events        AS e
INNER JOIN
  event_dates   AS ed
    ON e.id=ed.event_id
WHERE
  ed.start >= DATE(NOW())
GROUP BY
  e.modified_datetime,
  e.id 
ORDER BY
  e.modified_datetime DESC,
  e.created_datetime DESC 
LIMIT 0,4

Why = is faster than >= is simply because >= is a Range of values, not a very specific value. It's like saying "get me ever page in the book from page 101 onwards" instead of "get me page 101". It's more intensive by definition, especially as your query then involves aggregating and sorting many more records.

In terms of optimisation, your best option is to ensure relevant indexes...

event_dates:
- an index just on start should be sufficient

events:
- an index on id will dramatically improve the join performance
- adding modified_datetime and created_datetime to that index may help


Probably missing indexes on fields you are grouping and searching. Please provide us with: SHOW INDEXES FROM events and SHOW INDEXES FROM event_dates

If there are no indexes then you can add them:

ALTER TABLE events ADD INDEX(modified_datetime);
ALTER TABLE events ADD INDEX(created_datetime);
ALTER TABLE event_dates ADD INDEX(start);

Also be sure you have them on id fields. But here you would probably like to have them as primary keys.


Calculating DATE(NOW()) in advance will not have any impact on performance. It's computed only once (not for each row). But you have 2 different queries (one with >=, another with =). It seems natural that the first one (>=) takes longer time to execute since it returns many more rows. Also, it may decide to use different execution plan compared to query with = , for example, full table scan instead index seek/scan


You can do something like this

DECLARE @CURRENTDATE AS DATETIME 
SET @CURRENTDATE = GETDATE()

then change your code to use

@CURRENTDATE variable.... "e.start >= @CURRENTDATE

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜