Select rows where ColA=ColB and also add a new column that tells whether the original data in ColA is repeated
I am trying to figure out how to query a table (the table is actually a result set, so it will be a subquery), group it by ColA=ColB
(see below), and create a calculated field all in one step.
So, if my test data looks like
ColA ColB ColC
1 1 aaa 1 2 bbbb 1 3 cccc 2 2 dddd 3 3 eeee 3 4 ffff 3 5 gggg 3 6 hhhh 4 4 iiii 5 5 jjjj 6 6 kkkk 6 7 llll 6 8 mmmm 开发者_JAVA技巧
I would like to retrieve only rows where ColA=ColB
and also add a new column that tells me whether the original data in ColA
was repeated. See below.
ColA ColB ColC multiples
1 1 aaaa yes 2 2 dddd no 3 3 eeee yes 4 4 iiii no 5 5 jjjj no 6 6 kkkk yes
Can someone help me out with the syntax? I have been playing with Group By's and SubSelects to no avail. Do I need to use a case statement to for the multiples field?
It's more helpful to post create table and insert statements instead of Desc table and select * from table_name; http://tkyte.blogspot.com/2005/06/how-to-ask-questions.html
create table test_repeat(
cola number,
colb number,
colc varchar2(20)
);
insert into test_repeat values (1,1,'aaa');
insert into test_repeat values (1,2,'bbbb');
insert into test_repeat values (1,3,'cccc');
insert into test_repeat values (2,2,'dddd');
insert into test_repeat values (3,3,'eeee');
insert into test_repeat values (3,4,'ffff');
insert into test_repeat values (3,5,'gggg');
insert into test_repeat values (3,6,'hhhh');
insert into test_repeat values (4,4,'iiii');
insert into test_repeat values (5,5,'jjjj');
insert into test_repeat values (6,6,'kkkk');
insert into test_repeat values (6,7,'llll');
insert into test_repeat values (6,8,'mmmm');
commit;
1. You can use the Oracle analytic function Lead to look through your result set to see if colA is the same for the next row (after ordering it..) like..
select * from
(select colA, colb,
(case when colA = (lead(cola) over
(partition by colA order by cola, colb))
then 'Yes'
else 'No'
end) multiples,
colc
from test_repeat)
where colA = colb
/
COLA COLB MUL COLC
---------- ---------- --- --------------------
1 1 Yes aaa
2 2 No dddd
3 3 Yes eeee
4 4 No iiii
5 5 No jjjj
6 6 Yes kkkk
2. Or you can get the count for each value of COLA and compare it to see if there are duplicates...
select a.colA, a.colb, a.colc,
(case when (select count(*) from test_repeat t where t.cola = a.colA) > 1
then 'Yes'
else 'No'
end) Repeat
from test_repeat a
where colA = colB
/
COLA COLB COLC REP
---------- ---------- -------------------- ---
1 1 aaa Yes
2 2 dddd No
3 3 eeee Yes
4 4 iiii No
5 5 jjjj No
6 6 kkkk Yes
They are both equally simple ,but I would suggest the analytic function approach as I have found it to be generally faster for all Queries I have worked with in the past.
SQL> select *
2 from test_repeat
3 order by cola
4 /
COLA COLB COLC
---------- ---------- --------------------
1 2 bbbb
1 1 aaa
1 3 cccc
2 2 dddd
3 4 ffff
3 3 eeee
3 5 gggg
3 6 hhhh
4 4 iiii
5 5 jjjj
6 6 kkkk
6 7 llll
6 8 mmmm
7 9 nnnn
14 rows selected.
SQL> select cola "ColA"
2 , max(decode(colb,cola,colb)) "ColB"
3 , max(decode(colb,cola,colc)) "ColC"
4 , case count(*) when 1 then 'no' else 'yes' end "multiples"
5 from test_repeat
6 group by cola
7 having cola = max(decode(colb,cola,colb))
8 order by cola
9 /
ColA ColB ColC mul
---------- ---------- -------------------- ---
1 1 aaa yes
2 2 dddd no
3 3 eeee yes
4 4 iiii no
5 5 jjjj no
6 6 kkkk yes
6 rows selected.
Performancewise, this SQL is approximately equal to Rajesh' first query. So you can choose the one you are more comfortable with.
SQL> set autotrace on
SQL> select * from
2 (select colA, colb,
3 (case when colA = (lead(cola) over
4 (partition by colA order by cola, colb))
5 then 'Yes'
6 else 'No'
7 end) multiples,
8 colc
9 from test_repeat)
10 where colA = colb
11 /
COLA COLB MUL COLC
---------- ---------- --- --------------------
1 1 Yes aaa
2 2 No dddd
3 3 Yes eeee
4 4 No iiii
5 5 No jjjj
6 6 Yes kkkk
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1491815685
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 574 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 14 | 574 | 4 (25)| 00:00:01 |
| 2 | WINDOW SORT | | 14 | 532 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_REPEAT | 14 | 532 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COLA"="COLB")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
15 consistent gets
0 physical reads
0 redo size
421 bytes sent via SQL*Net to client
238 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
SQL> /
COLA COLB MUL COLC
---------- ---------- --- --------------------
1 1 Yes aaa
2 2 No dddd
3 3 Yes eeee
4 4 No iiii
5 5 No jjjj
6 6 Yes kkkk
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1491815685
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 574 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 14 | 574 | 4 (25)| 00:00:01 |
| 2 | WINDOW SORT | | 14 | 532 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_REPEAT | 14 | 532 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COLA"="COLB")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
421 bytes sent via SQL*Net to client
238 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
SQL> select cola "ColA"
2 , max(decode(colb,cola,colb)) "ColB"
3 , max(decode(colb,cola,colc)) "ColC"
4 , case count(*) when 1 then 'no' else 'yes' end "multiples"
5 from test_repeat
6 group by cola
7 having cola = max(decode(colb,cola,colb))
8 order by cola
9 /
ColA ColB ColC mul
---------- ---------- -------------------- ---
1 1 aaa yes
2 2 dddd no
3 3 eeee yes
4 4 iiii no
5 5 jjjj no
6 6 kkkk yes
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3021378319
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 532 | 4 (25)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | SORT GROUP BY | | 14 | 532 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_REPEAT | 14 | 532 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COLA"=MAX(DECODE("COLB","COLA","COLB")))
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
421 bytes sent via SQL*Net to client
238 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
SQL> /
ColA ColB ColC mul
---------- ---------- -------------------- ---
1 1 aaa yes
2 2 dddd no
3 3 eeee yes
4 4 iiii no
5 5 jjjj no
6 6 kkkk yes
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3021378319
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 532 | 4 (25)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | SORT GROUP BY | | 14 | 532 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_REPEAT | 14 | 532 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COLA"=MAX(DECODE("COLB","COLA","COLB")))
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
420 bytes sent via SQL*Net to client
238 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
SQL> set autotrace off
Regards,
Rob.
I think you need a subselect for the multiples
column. No grouping is needed. Write something like:
select ColA, ColB, ColC,
(CASE
WHEN (select b.ColA from thistable b where b.ColA = a.ColA) > 1 THEN 'yes'
ELSE 'no'
END)
from thistable a
where ColA = ColB
I was not aware of the lead function, very nice. Rob and Rajesh, thanks for the answers so far, I think I was able to partly answer my own question using two subqueries and an outer join.
I am guessing my query is not as efficient as the others posted, but I wonder if folks could comment on this query. Will it always work? Can it be tweaked to make it more efficient? It still needs to put in 'yes' or 'no' for multiples, right now it puts the multiple count or a null for 0.
SELECT BB.ColA, ColB, ColC, AA.CNT FROM (SELECT ColA, count(*) as CNT FROM TMPY where ColA<>ColB Group by ColA) AA, (SELECT ColA, ColB, ColC FROM TMPY WHERE COLA=COLB ) BB WHERE AA.COLa(+)=BB.COLB order by bb.ColA;
Thank you again!
精彩评论