Can I do a max(count(*)) in SQL?
Here's my code:
select yr,count(*)
from movie
join casting on casting.movieid=movie.id
join actor on casting.actorid = ac开发者_JAVA技巧tor.id
where actor.name = 'John Travolta'
group by yr;
Here's the question:
Which were the busiest years for 'John Travolta'. Show the number of movies he made for each year.
Here's the table structure:
movie(id, title, yr, score, votes, director)
actor(id, name)
casting(movieid, actorid, ord)
This is the output I am getting:
yr count(*)
1976 1
1977 1
1978 1
1981 1
1994 1
-- etc.
I need to get the rows for which count(*)
is max.
How do I do this?
Use:
SELECT m.yr,
COUNT(*) AS num_movies
FROM MOVIE m
JOIN CASTING c ON c.movieid = m.id
JOIN ACTOR a ON a.id = c.actorid
AND a.name = 'John Travolta'
GROUP BY m.yr
ORDER BY num_movies DESC, m.yr DESC
Ordering by num_movies DESC
will put the highest values at the top of the resultset. If numerous years have the same count, the m.yr
will place the most recent year at the top... until the next num_movies
value changes.
Can I use a MAX(COUNT(*)) ?
No, you can not layer aggregate functions on top of one another in the same SELECT clause. The inner aggregate would have to be performed in a subquery. IE:
SELECT MAX(y.num)
FROM (SELECT COUNT(*) AS num
FROM TABLE x) y
Just order by count(*) desc
and you'll get the highest (if you combine it with limit 1
)
SELECT * from
(
SELECT yr as YEAR, COUNT(title) as TCOUNT
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr
order by TCOUNT desc
) res
where rownum < 2
This question is old, but was referenced in a new question on dba.SE. I feel the best solutions haven't been provided. Plus, there are new, faster options.
Question in the title
Can I do a
max(count(*))
in SQL?
Yes, you can achieve that by nesting an aggregate function in a window function:
SELECT m.yr, count(*) AS movie_count
, max(count(*)) OVER () AS max_ct
FROM casting c
JOIN movie m ON c.movieid = m.id
WHERE c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
GROUP BY m.yr
ORDER BY count(*) DESC;
db<>fiddle here
That's standard SQL. Postgres introduced it with version 8.4 (released 2009-07-01, before this question was asked. Other RDBMS should be capable of the same.
Consider the sequence of events in a SELECT
query:
- Best way to get result count before LIMIT was applied
Possible downside: window functions do not aggregate rows. You get all rows left after the aggregate step. Useful in some queries, but not ideal for this one.
To get one row with the highest count, you can use ORDER BY ct LIMIT 1
:
SELECT c.yr, count(*) AS ct
FROM actor a
JOIN casting c ON c.actorid = a.id
WHERE a.name = 'John Travolta'
GROUP BY c.yr
ORDER BY ct DESC
LIMIT 1;
Using only basic SQL features, available in any halfway decent RDBMS - the LIMIT
implementation varies:
- SQL select elements where sum of field is less than N
Or you can get one row per group with the highest count with DISTINCT ON
(only Postgres):
- Select first row in each GROUP BY group?
Actual Question
I need to get the rows for which
count(*)
is max.
There may be more than one row with the highest count.
SQL Server has had the feature WITH TIES
for some time - with non-standard syntax:
SELECT TOP 1 WITH TIES
m.yr, count(*) AS movie_count
FROM casting c
JOIN movie m ON c.movieid = m.id
WHERE c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
GROUP BY m.yr
ORDER BY count(*) DESC; -- can't sort by year for this
db<>fiddle here
PostgreSQL 13 added WITH TIES
with standard SQL syntax:
SELECT m.yr, count(*) AS movie_count
FROM casting c
JOIN movie m ON c.movieid = m.id
WHERE c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
GROUP BY m.yr
ORDER BY count(*) DESC -- can't sort by year for this
FETCH FIRST 1 ROWS WITH TIES;
db<>fiddle here
This should be the fastest possible query. Further reading:
Get top row(s) with highest value, with ties
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
To sort results by additional criteria (or for older versions of Postgres or other RDBMS without WITH TIES
), use the window function rank()
in a subquery:
SELECT yr, movie_count
FROM (
SELECT m.yr, count(*) AS movie_count
, rank() OVER (ORDER BY count(*) DESC) AS rnk
FROM casting c
JOIN movie m ON c.movieid = m.id
WHERE c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
GROUP BY m.yr
) sub
WHERE rnk = 1
ORDER BY yr; -- optionally sort by year
All major RDBMS support window functions nowadays.
it's from this site - http://sqlzoo.net/3.htm 2 possible solutions:
with TOP 1 a ORDER BY ... DESC:
SELECT yr, COUNT(title)
FROM actor
JOIN casting ON actor.id=actorid
JOIN movie ON movie.id=movieid
WHERE name = 'John Travolta'
GROUP BY yr
HAVING count(title)=(SELECT TOP 1 COUNT(title)
FROM casting
JOIN movie ON movieid=movie.id
JOIN actor ON actor.id=actorid
WHERE name='John Travolta'
GROUP BY yr
ORDER BY count(title) desc)
with MAX:
SELECT yr, COUNT(title)
FROM actor
JOIN casting ON actor.id=actorid
JOIN movie ON movie.id=movieid
WHERE name = 'John Travolta'
GROUP BY yr
HAVING
count(title)=
(SELECT MAX(A.CNT)
FROM (SELECT COUNT(title) AS CNT FROM actor
JOIN casting ON actor.id=actorid
JOIN movie ON movie.id=movieid
WHERE name = 'John Travolta'
GROUP BY (yr)) AS A)
Using max with a limit will only give you the first row, but if there are two or more rows with the same number of maximum movies, then you are going to miss some data. Below is a way to do it if you have the rank() function available.
SELECT
total_final.yr,
total_final.num_movies
FROM
( SELECT
total.yr,
total.num_movies,
RANK() OVER (ORDER BY num_movies desc) rnk
FROM (
SELECT
m.yr,
COUNT(*) AS num_movies
FROM MOVIE m
JOIN CASTING c ON c.movieid = m.id
JOIN ACTOR a ON a.id = c.actorid
WHERE a.name = 'John Travolta'
GROUP BY m.yr
) AS total
) AS total_final
WHERE rnk = 1
The following code gives you the answer. It essentially implements MAX(COUNT(*)) by using ALL. It has the advantage that it uses very basic commands and operations.
SELECT yr, COUNT(title)
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr HAVING COUNT(title) >= ALL
(SELECT COUNT(title)
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr)
Depending on which database you're using...
select yr, count(*) num from ...
order by num desc
Most of my experience is in Sybase, which uses some different syntax than other DBs. But in this case, you're naming your count column, so you can sort it, descending order. You can go a step further, and restrict your results to the first 10 rows (to find his 10 busiest years).
create view sal as
select yr,count(*) as ct from
(select title,yr from movie m, actor a, casting c
where a.name='JOHN'
and a.id=c.actorid
and c.movieid=m.id)group by yr
-----VIEW CREATED-----
select yr from sal
where ct =(select max(ct) from sal)
YR 2013
Thanks to the last answer
SELECT yr, COUNT(title)
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr HAVING COUNT(title) >= ALL
(SELECT COUNT(title)
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr)
I had the same problem: I needed to know just the records which their count match the maximus count (it could be one or several records).
I have to learn more about "ALL clause", and this is exactly the kind of simple solution that I was looking for.
select top 1 yr,count(*) from movie
join casting on casting.movieid=movie.id
join actor on casting.actorid = actor.id
where actor.name = 'John Travolta'
group by yr order by 2 desc
you can use the top
along with with ties
, which will include all of the years having the maximum count(*)
value, something like this:
select top (1) with ties yr, count(*)
from movie
join casting
on casting.movieid=movie.id
join actor
on casting.actorid = actor.id
where actor.name = 'John Travolta'
group by yr;
order by count(*) desc
If the maximum is say 6, you'll get all of the years for which the count value is 6.
精彩评论