开发者

How to compare groups of tuples in sql

How to compare groups of tuples in sql: consider the following example:

TABLE T1
--------
GROUP     VALUE
-----     -----
A         FOO
A         BAR
X         HHH
X         ZOO

TABLE T2
--------
GROUP     VALUE
-----     -----
B         ZOO
C         FOO
C         BAR

I want to write an sql query which compares the groups of values in both tables and reports the differences. In the illustrated example, the group in table a: ((A,FOO),(A,BAR)) is the same as the group ((C,FOO),(C,BAR)) even though the group name is different. What counts is that the contents of the group are the same. Finally the query would report that there is a difference: it is the (B,ZOO) tuple.

RESULT
------
GROUP     VALUE
-----     -----
B         ZOO
X         HHH
X 开发者_如何学JAVA        ZOO

Although the group X containing ZOO in T1 has a matching value in T2: (B,ZOO) it is still not a match because the group also has the (X, HHH) value which is not part of the (B, ZOO) group in T2


Something like this

create table t1 (group_id varchar2(20), value varchar2(20));
create table t2 (group_id varchar2(20), value varchar2(20));

insert into t1 values ('A','FOO');
insert into t1 values ('A','BAR');
insert into t1 values ('X','HHH');
insert into t1 values ('X','ZOO');
insert into t2 values ('C','FOO');
insert into t2 values ('C','BAR');
insert into t2 values ('B','ZOO');


select t1.group_id t1_group,t2.group_id t2_group, 
      --t1.all_val, t2.all_val, 
       case when t1.all_val = t2.all_val then 'match' else 'no match' end coll_match
from 
  (select 'T1' tab_id, group_id, collect(value) all_val, 
          min(value) min_val, max(value) max_val, count(distinct value) cnt_val 
  from t1 group by group_id) t1
full outer join
  (select 'T2' tab_id, group_id, collect(value) all_val, 
          min(value) min_val, max(value) max_val, count(distinct value) cnt_val 
  from t2 group by group_id) t2
on t1.min_val = t2.min_val and t1.max_val = t2.max_val and t1.cnt_val = t2.cnt_val
/

I've done a preliminary elimination based on the minmium, maximum and number of distinct values in each group, which would help with large datasets. If the datasets are small enough, you may not need them.

That tells you the matches. You just need to push it out an extra step to find the groups that don't have any matches

select t1_group
from
(
  select t1.group_id t1_group,t2.group_id t2_group, 
        --t1.all_val, t2.all_val, 
         case when t1.all_val = t2.all_val then 'match' end coll_match
  from 
    (select 'T1' tab_id, group_id, collect(value) all_val
    from t1 group by group_id) t1
  cross join
    (select 'T2' tab_id, group_id, collect(value) all_val
    from t2 group by group_id) t2
)
group by t1_group
having min(coll_match) is null
/

select t2_group
from
(
  select t1.group_id t1_group,t2.group_id t2_group, 
        --t1.all_val, t2.all_val, 
         case when t1.all_val = t2.all_val then 'match' end coll_match
  from 
    (select 'T1' tab_id, group_id, collect(value) all_val
    from t1 group by group_id) t1
  cross join
    (select 'T2' tab_id, group_id, collect(value) all_val
    from t2 group by group_id) t2
)
group by t2_group
having min(coll_match) is null
/


Difference between T1 and T2 (two tables) could be this:

SELECT
   T1.GROUPNAME,
   T1.VALUE
FROM 
   T1
LEFT JOIN T2
ON T2.Value = T1.Value
WHERE T2.GROUPNAME IS NULL

For instance T1 has:

Foo 100 Bar 200 ZZZ 333

And T2 includes: Foo 100 Bar 200

The result of that query is ZZZ 333 it is the only record that does not match in both tables. You could even change T2's group name to say:

XYZ 100 ZXZ 200

And the result is still ZZZ 333. This is per what you are asking, if you want the opposite you can either UNION to it, or use RIGHT join.

Jon

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜