开发者

Counting Distinct Values in large dataset (40M rows): SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;

CREATE TABLE `names` ( `name` varchar(20) );

Assume the names table contains all 40 million first names of everyone living i开发者_开发知识库n California (for example).

SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;

How can I optimize this query?

Expected Result:

count | name
 9999 | joe
 9995 | mike
 9990 | kate
 .... | ....
    2 | kal-el


You have to create an index on the name column of your table. The query is as good as it can be.


Well, what makes you think it's not already optimised? This looks like the sort of query a good database engine should be able to handle relatively easily - particularly if you've got an appropriate index on your table.

Do you actually have a bottleneck here, or are you worrying about something that might happen in the future? If it's the latter, I suggest you try it with your RDBMS (by generating dummy data), and see what happens.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜