开发者

Optimizing MySQL statement with lot of count(row) an sum(row+row2)

I need to use InnoDB storage engine on a table with abou开发者_如何学编程t 1mil or so records in it at any given time. It has records being inserted to it at a very fast rate, which are then dropped within a few days, maybe a week. The ping table has about a million rows, whereas the website table only about 10,000.

My statement is this:

select url
from website ws, ping pi 
where ws.idproxy = pi.idproxy and pi.entrytime > curdate() - 3 and contentping+tcpping is not null 
group by url 
having sum(contentping+tcpping)/(count(*)-count(errortype)) < 500 and count(*) > 3 and 
count(errortype)/count(*) < .15
order by sum(contentping+tcpping)/(count(*)-count(errortype)) asc;

I added an index on entrytime, yet no dice. Can anyone throw me a bone as to what I should consider to look into for basic optimization of this query. The result set is only like 200 rows, so I'm not getting killed there.


In the absence of the schemas of the relations, I'll have to make some guesses.

  • If you're making WHERE a.attrname = b.attrname clauses, that cries out for a JOIN instead.

  • Using COUNT(*) is both redundant and sometimes less efficient than COUNT(some_specific_attribute). The primary key is a good candidate.

  • Why would you test contentping+tcpping IS NOT NULL, asking for a calculation that appears unnecessary, instead of just testing whether the attributes individually are null?

Here's my attempt at an improvement:

SELECT url
FROM website AS ws
    JOIN ping AS pi
        ON ws.idproxy = pi.idproxy
WHERE
    pi.entrytime > CURDATE() - 3
    AND pi.contentping IS NOT NULL
    AND pi.tcpping IS NOT NULL
GROUP BY url
HAVING
    SUM(pi.contentping + pi.tcpping) / (COUNT(pi.idproxy) - COUNT(pi.errortype)) < 500
    AND COUNT(pi.idproxy) > 3
    AND COUNT(pi.errortype) / COUNT(pi.idproxy) < 0.15
ORDER BY
    SUM(pi.contentping + pi.tcpping) / (COUNT(pi.idproxy) - COUNT(pi.errortype)) ASC;

Performing lots of identical calculations in both the HAVING and ORDER BY clauses will likely be costing you performance. You could either put them in the SELECT clause, or create a view that has those calculations as attributes and use that view for accessing the values.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜