SQL Query for all pairs of elements that are only in different groups
I have a table called Survey with a Group Column and a Subject Column
CREATE TABLE survey (
`group` INT NOT NULL,
`subject` VARCHAR(16) NOT NULL,
UNIQUE INDEX (`group`, `subject`)
);
INSERT INTO survey
VALUES
(1, 'sports'),
(1, 'history'),
(2, 'art'),
(2, 'music'),
(3, 'math'),
(3, 'sports'),
(3, 'science')
;
I am trying to figure out a query that wi开发者_如何学编程ll return all pairs of subjects that are not part of the same group. So from my above example, I would like to see these pairs returned in a table:
science - history
science - art
science - music
history - math
sports - art
sports - music
history - art
history - music
Thus, the query shouldn't return:
sports - history
as an example since they are both in Group 1.
Thanks so much.
SELECT s1.subject,
s2.subject
FROM survey s1
JOIN survey s2
ON s1.subject < s2.subject
GROUP BY s1.subject,
s2.subject
HAVING COUNT(CASE
WHEN s1.groupid = s2.groupid THEN 1
END) = 0
Sample table
create table Survey(groupid int, subject varchar(100))
insert into Survey select
1, 'sports' union all select
1, 'history' union all select
2, 'art' union all select
2, 'music' union all select
3, 'math' union all select
3, 'sports' union all select
3, 'science'
The ANSI-compliant query, which works for all mainstream RDBMS
select a.subject, b.subject
from (select distinct subject from Survey) A
inner join (select distinct subject from Survey) B on B.subject > A.subject
left join Survey C on C.subject = A.subject
left join Survey D on D.subject = B.subject and D.groupid = C.groupid
where D.groupid is null
order by a.subject, b.subject
Here's a slightly different approach:
SELECT *
FROM (SELECT DISTINCT subject FROM yourtable) AS T1
JOIN (SELECT DISTINCT subject FROM yourtable) AS T2
ON T1.subject < T2.subject
WHERE NOT EXISTS
(
SELECT *
FROM yourtable T3
JOIN yourtable T4
ON T3.id = T4.id
WHERE T1.subject = T3.subject
AND T2.subject = T4.subject
)
ORDER BY t1.subject, t2.subject;
The standard way would be to use MINUS
to get the complement of all pairs that are in the same group, but MySQL doesn't support MINUS
. For MySQL, you can transform a MINUS
into a statement based on the NOT IN
operator and a sub-query:
SELECT s1.subject, s2.subject
FROM survey AS s1
JOIN survey AS s2
WHERE (s1.subject, s2.subject) NOT IN
(
SELECT s1.subject, s2.subject
FROM survey AS s1
JOIN survey AS s2
ON s1.group = s2.group
)
;
Note that this can produce duplicates. If you don't want them, use SELECT DISTINCT
.
With indices and the sample data, the extended query plan is:
+----+--------------------+-------+--------+---------------+-------+---------+--------------------+------+----------+---------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+--------+---------------+-------+---------+--------------------+------+----------+---------------------------------------------+ | 1 | PRIMARY | s1 | index | NULL | group | 54 | NULL | 7 | 100.00 | Using index; Using temporary | | 1 | PRIMARY | s2 | index | NULL | group | 54 | NULL | 7 | 100.00 | Using where; Using index; Using join buffer | | 2 | DEPENDENT SUBQUERY | s1 | index | group | group | 54 | NULL | 7 | 85.71 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | s2 | eq_ref | group | group | 54 | test.s1.group,func | 1 | 100.00 | Using where; Using index | +----+--------------------+-------+--------+---------------+-------+---------+--------------------+------+----------+---------------------------------------------+
Select S1.Subject As LeftSubject
, S2.Subject As RightSubject
From SourceData As S1
Join SourceData As S2
On S2.subject > S1.subject
Left Join (
Select S1.groupid
, S1.subject As LeftSubject
, S2.subject As RightSubject
From SourceData As S1
Join SourceData As S2
On S2.groupid = S1.groupid
And S2.subject > S1.subject
) As Z
On Z.groupid = S1.groupid
And Z.LeftSubject = S1.subject
And Z.RightSubject = S2.subject
Where Z.groupid is null
Another variant using outis' tuple format:
Select S1.Subject As LeftSubject
, S2.Subject As RightSubject
From SourceData As S1
Join SourceData As S2
On S2.subject > S1.subject
Where (S1.groupid, S1.subject, S2.subject) Not In (
Select S1.groupid
, S1.subject
, S2.subject
From SourceData As S1
Join SourceData As S2
On S2.groupid = S1.groupid
Where S2.subject > S1.subject
)
精彩评论