Profiling SQL query
I'm using sqlalchemy (expression language, not full ORM) with MySQL and experiencing some unexpected slowness. Particularly, the time spent performing a select query in by sqlalchemy is ten times greater than the time spent performing the same query from the mysql command line.
Output from cprofile:
ncalls tottime percall cumtime percall filename:lineno(function)
100 206.703 2.067 206.703 2.067 {method 'query' of '_mysql.connection' objects}
MySQL time: 0.26 seconds
The consensus seems to be that开发者_如何学运维 there is some overhead using sqlalchemy, but not nearly this much. Any suggestions as to what could cause behavior like this?
The queries are generally of the form:
SELECT fieldnames.minage, fieldnames.maxage, fieldnames.race,
fieldnames.sex, sum( pop.population ) AS pop, pop.zip5
FROM pop
INNER JOIN fieldnames ON fieldnames.fieldname = pop.fieldname_id
WHERE fieldnames.race IN ("White alone")
AND fieldnames.sex IN ("Female")
AND fieldnames.maxage >=101
AND fieldnames.minage <=107
GROUP BY fieldnames.minage, fieldnames.maxage
One possible reason for slowness - Does sql alchemy use prepared statements? If yes, then a reason why you may be experiencing a difference in performance is because the mysql optimizer has different information when creating the two query plans.
When you run the query from the command-line, the mysql optimizer has the complete query with all where clause values filled in (as you showed above3), thus can optimize explicitly for these values.
When you run from sql alchemy, the mysql optimizer may only see this (perhaps fieldnames.race and fieldnames.sex are parameterized as well):
SELECT fieldnames.minage, fieldnames.maxage, fieldnames.race,
fieldnames.sex, sum( pop.population ) AS pop, pop.zip5
FROM pop
INNER JOIN fieldnames ON fieldnames.fieldname = pop.fieldname_id
WHERE fieldnames.race IN ("White alone")
AND fieldnames.sex IN ("Female")
AND fieldnames.maxage >= ?
AND fieldnames.minage <= ?
GROUP BY fieldnames.minage, fieldnames.maxage
Thus the optimizer has to make a guess on what values you might use then optimize around that. Unfortunately, it may make a bad guess, and thus in a worst case create a query plan that makes the query run significantly slower than you expect.
精彩评论