SQL Order by Number of Common values
I have a table setup as follows:
+---------------+
| resources |
+---------------+
| id |
| name |
+---------------+
+-------------------+
| resource_subjects |
+-------------------+
| resource_id |
| subject_id |
+-------------------+
What I need to do is construct a query which finds the number of shared subjects between two resources.
So with a resource_subjects table like this:
+---------------------------+
| resource_id | subject_id |
+---------------------------+
| resource1 | 1 |
| resource1 | 2 |
| resource1 | 3 |
| resource1 | 4 |
| resource1 | 5 |
| resource2 | 1 |
| resource2 | 2 |
| resource2 | 3 |
| resource3 | 1 |
| resource3 | 4 |
+---------------------------+
I'd like this query to give me 开发者_C百科something like:
+----------------------------------------------------------+
| first_resource | second_resource | shared_subjects_count |
+----------------------------------------------------------+
| resource1 | resource2 | 3 |
| resource1 | resource3 | 2 |
| resource2 | resource3 | 1 |
+----------------------------------------------------------+
To get the idea across, the pseudo-code in my head would read something like:
SELECT
id
ASfirst_resource
,id
ASsecond_resource
, COUNT(number of subjects shared between first_resource and second_resource in resource_subjects table) ASshared_subjects_count
ORDER BY shared_subjects_count DESC
If anyone can provide a sample query, or even point me in the right direction that'd be amazing.
Thanks
To create a table similar to yours I used:
CREATE TABLE resource_subjects (
res_id int(11),
sub_id int(11)
);
INSERT INTO resource_subjects VALUES
(1,1),
(1,2),
(1,3),
(1,4),
(1,5),
(2,1),
(2,2),
(2,3),
(3,1),
(3,4);
And then the query you can use is:
SELECT t2.res_id 'first', t1.res_id 'second', COUNT(t1.sub_id)
FROM resource_subjects t1
JOIN resource_subjects t2 ON t1.res_id > t2.res_id AND t1.sub_id = t2.sub_id
GROUP BY 1,2
Note that my resource_id is a real id (integer) and not a string, which allows the bigger than condition in the ON clause.
This seems to work in MySql:
select
a.resource_id as first_resource,
b.resource_id as second_resource,
count(*) as shared_subjects_count
from
resource_subjects as a,
resource_subjects as b
where
a.resource_id < b.resource_id
and a.subject_id = b.subject_id
group by a.resource_id,b.resource_id;
精彩评论