SELECTing top N rows without ROWNUM?
I hope you can help me with my homework :)
We need to build a query that outputs the top N best paid employees.
My version works perfectly fine.
For example the top 3:SELECT name, salary
FROM staff
WHERE salary IN ( SELECT *
FROM ( SELECT salary
FROM staff
ORDER BY salary DESC )
WHERE ROWNUM <= 3 )
ORDER BY salary DESC
;
Note that this will output employees that are in the top 3 and have the same salary, too.
1: Mike, 4080
2: Steve, 2800 2: Susan, 2800 2: Jack, 2800 3: Chloe, 1400
But now our teacher does not allow us to use ROWNUM
.
My second solution thanks to Justin Caves' hint.
First i tried this:
SELECT name, salary, ( rank() OVER ( ORDER BY salary DESC ) ) as myorder
FROM staff
WHERE myorder <= 3
;
The errormessage is: "myorder: invalid identifier"
Thanks to DCookie its now clear:
开发者_运维技巧"[...] Analytics are applied AFTER the where clause is evaluated, which is why you get the error that myorder is an invalid identifier."
Wrapping a SELECT around solves this:
SELECT *
FROM ( SELECT name, salary, rank() OVER ( ORDER BY salary DESC ) as myorder FROM staff )
WHERE myorder <= 3
;
My teacher strikes again and don't allow such exotic analytic functions.
3rd solution from @Justin Caves.
"If analytic functions are also disallowed, the other option I could imagine-- one that you would never, ever, ever actually write in practice, would be something like"
SELECT name, salary
FROM staff s1
WHERE (SELECT COUNT(*)
FROM staff s2
WHERE s1.salary < s2.salary) <= 3
Since this is homework, a hint rather than an answer. You'll want to use analytic functions. ROW_NUMBER, RANK, or DENSE_RANK can work depending on how you want to handle ties.
If analytic functions are also disallowed, the other option I could imagine-- one that you would never, ever, ever actually write in practice, would be something like
SELECT name, salary
FROM staff s1
WHERE (SELECT COUNT(*)
FROM staff s2
WHERE s1.salary < s2.salary) <= 3
With regard to performance, I wouldn't rely on the COST number from the query plan-- that's only an estimate and it is not generally possible to compare the cost between plans for different SQL statements. You're much better off looking at something like the number of consistent gets the query actually does and considering how the query performance will scale as the number of rows in the table increases. The third option is going to be radically less efficient than the other two simply because it needs to scan the STAFF table twice.
I don't have your STAFF table, so I'll use the EMP table from the SCOTT schema
The analytic function solution actually does 7 consistent gets as does the ROWNUM solution
Wrote file afiedt.buf
1 select ename, sal
2 from( select ename,
3 sal,
4 rank() over (order by sal) rnk
5 from emp )
6* where rnk <= 3
SQL> /
ENAME SAL
---------- ----------
smith 800
SM0 950
ADAMS 1110
Execution Plan
----------------------------------------------------------
Plan hash value: 3291446077
--------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 14 | 672 | 4 (25)| 00:00:01
|* 1 | VIEW | | 14 | 672 | 4 (25)| 00:00:01
|* 2 | WINDOW SORT PUSHED RANK| | 14 | 140 | 4 (25)| 00:00:01
| 3 | TABLE ACCESS FULL | EMP | 14 | 140 | 3 (0)| 00:00:01
--------------------------------------------------------------------------------
-
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNK"<=3)
2 - filter(RANK() OVER ( ORDER BY "SAL")<=3)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
668 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3 rows processed
SQL> select ename, sal
2 from( select ename, sal
3 from emp
4 order by sal )
5 where rownum <= 3;
ENAME SAL
---------- ----------
smith 800
SM0 950
ADAMS 1110
Execution Plan
----------------------------------------------------------
Plan hash value: 1744961472
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 105 | 4 (25)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 14 | 490 | 4 (25)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 14 | 140 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | EMP | 14 | 140 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=3)
3 - filter(ROWNUM<=3)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
668 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3 rows processed
The COUNT(*) solution, however, actually does 99 consistent gets and has to do a full scan of the table twice so it is more than 10 times less efficient. And it will scale much worse as the number of rows in the table increases
SQL> select ename, sal
2 from emp e1
3 where (select count(*) from emp e2 where e1.sal < e2.sal) <= 3;
ENAME SAL
---------- ----------
JONES 2975
SCOTT 3000
KING 5000
FORD 3000
FOO
Execution Plan
----------------------------------------------------------
Plan hash value: 2649664444
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 140 | 24 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMP | 14 | 140 | 3 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 4 | | |
|* 4 | TABLE ACCESS FULL| EMP | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( (SELECT COUNT(*) FROM "EMP" "E2" WHERE
"E2"."SAL">:B1)<=3)
4 - filter("E2"."SAL">:B1)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
99 consistent gets
0 physical reads
0 redo size
691 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed
The reason you must wrap the statement with another select is because the outer select statement is the one that limits your result set to the row numbers desired. Here's a helpful link on analytics. If you run the inner select by itself you'll see why you have to do this. Analytics are applied AFTER the where clause is evaluated, which is why you get the error that myorder is an invalid identifier.
Oracle? What about window functions?
select * from
(SELECT s.*, row_number over (order by salary desc ) as rn FROM staff s )
where rn <=3
When you use count(distinct <exp>)
, equal ranking top salaries will be treated as tie ranks.
select NAME, SALARY
from STAFF STAFF1
where 3 >= ( select count(distinct STAFF2.SALARY) RANK
from STAFF STAFF2
where STAFF2.SALARY >= STAFF1.SALARY)
You could solve this in Oracle 12c
select NAME, SALARY
from STAFF
order by SALARY DESC
FETCH FIRST 3 ROWS ONLY
(FETCH FIRST syntax is new with Oracle 12c)
精彩评论