开发者

MySQL performance using IN predicate

If I run the following queries each one returns quickly (0.01 sec) and gives me my desired result.

SELECT tagId FROM tag WHERE name='programming'

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId=123 OR tagId=124

(assume the two tagId numbers were the results from the first query)

I would like to combine these queries so I only have to run it once:

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (SELECT tagId FROM tag WHERE name='programming')

However this query completes in about 1 min and 20 sec. I have indexes on worktag.workcode, worktag.tagId, tag.tagId, and tag.name.

If I run DESCR开发者_如何学CIBE on the queries the first two use the indexes and the second one uses the index for the subquery (on the tag table) but doesn't use any indexes on the worktag table.

Does anyone know why this might be?

NOTE: the worktag table has over 18 million records in it.


Why don't you use a join instead of a subquery?

SELECT COUNT(DISTINCT workcode)
FROM worktag
LEFT JOIN tag
  ON worktag.tagId = tag.tagID
WHERE tag.name = 'programming'

P.S.: Seems to be reported as bug.


A database admin told me recently, that the syntax WHERE x IN ( ... ) is a pain for the database. A join is almost always better:

SELECT COUNT(DISTINCT wt.workcode) 
  FROM worktag wt, tag t 
 WHERE wt.tagId = t.tagId 
   AND t.name='programming'


SELECT COUNT(DISTINCT workcode) 
FROM worktag 
inner join tag on worktag.tagid = tag.tagid
WHERE tag.name='programming'


MySQL generally doesn't do so well with subqueries, even independent ones. The posters who discussed joins are right - if you've got a choice, use a join. If you can't easily use a join (ie, foo.x in (select y from bar where y = xxx limit 10)), you're better off running the limit into a temporary IN MEMORY table and using a join on it.

If you're using MySQL a lot, use EXPLAIN and you'll see how it's using your indexes and such things.


Have you tried:

SELECT COUNT(DISTINCT workcode) FROM worktag WHERE tagId IN (123, 124)

?

I'm not a MySQL expert, but it looks to me like you might be looking at a significant failure of the query optimizer.

On the other had, good for MySQL that it optimizes the OR in the second statement. I know databases that will successfully optimize the IN (), but not the OR version of the same logical request.


I guess the optimizer does some bad guess. Replacing the query with an inner join might help.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜