counting surnames in a range a-d, e-h, i-l.... with MySQL
I have a table of surnames and I want to count the number of surnames in each alphabetic range of A-D or E-H etc.
I came up with the following query, which works, and I woul开发者_JAVA技巧d like to hear people's opinions on it and perhaps ways to do it better.
select count(*) FROM people
group by surname REGEXP '^[a-d].*',
surname REGEXP '^[e-h].*',
surname REGEXP '^[i-l].*',
surname REGEXP '^[m-p].*',
surname REGEXP '^[q-t].*',
surname REGEXP '^[u-z].*';
Here's the best way to achieve this (using regex anyway):
select
sum(surname REGEXP '^[a-dA-D].*') as ad_count,
sum(surname REGEXP '^[e-hE-H].*') as eh_count,
sum(surname REGEXP '^[i-lI-L].*') as il_count,
sum(surname REGEXP '^[m-pM-P].*') as mp_count,
sum(surname REGEXP '^[q-tQ-T].*') as qd_count,
sum(surname REGEXP '^[u-zU-Z].*') as uz_count
from people
This elegant brevity works due to the fact that in mysql, true
is 1
and false
is 0
, thus sum(some condition)
is the count of how many time it's true.
btw, I added uppercase to your regex.
You would get better performance by selecting from an inner select that does the work of calculating the group more efficiently (eg by using a case on substr(surname, 1, 1)), then summing on tests against that calculated vale value.
Regex is overkill and completely not needed here.
Perhaps something like this, using basic string algebra:
SELECT
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'a' AND 'd' THEN 1 ELSE 0 END) AS `SUM_a-d`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'e' AND 'h' THEN 1 ELSE 0 END) AS `SUM_e-h`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'i' AND 'l' THEN 1 ELSE 0 END) AS `SUM_i-l`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'm' AND 'p' THEN 1 ELSE 0 END) AS `SUM_m-p`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'q' AND 't' THEN 1 ELSE 0 END) AS `SUM_q-t`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'u' AND 'z' THEN 1 ELSE 0 END) AS `SUM_u-z`
FROM `people`
You can make the query a bit more explicit like so:
SELECT
SUM(CASE WHEN surname REGEXP '^[a-d].*' THEN 1 ELSE 0 END) AS a_d_count
,SUM(CASE WHEN surname REGEXP '^[e-h].*' THEN 1 ELSE 0 END) AS e_h_count
,SUM(CASE WHEN surname REGEXP '^[i-l].*' THEN 1 ELSE 0 END) AS i_l_count
,SUM(CASE WHEN surname REGEXP '^[m-p].*' THEN 1 ELSE 0 END) AS m_p_count
,SUM(CASE WHEN surname REGEXP '^[q-t].*' THEN 1 ELSE 0 END) AS q_t_count
,SUM(CASE WHEN surname REGEXP '^[u-z].*' THEN 1 ELSE 0 END) AS u_z_count
FROM (SELECT surname FROM people ORDER BY surname ASC) p
Avoiding regexes and conditionals, you might do this:
SELECT CONCAT(LEFT(UPPER(surname),1), '-', CHAR(ASCII(UPPER(surname))+3)) AS r,
count(id)
FROM people
GROUP BY ROUND((ASCII(UPPER(surname)-65)/4),0);
This sets your ranges to 4 letters long, which does mean that the last range is'yz', but you can fiddle that around with a bit more math.
精彩评论