Find and group duplicates
Hopefully I'm able to explain what I'm trying to achieve, it's a bit complicated I think.
I have two tables like this:
ID | Names
--------------
A | Name1
B | Name2
C | Name3
ID | md5s
--------------
A | a
A | b
B | c
C | a
C | c
I'm trying to achieve this: In the end I want to have a list of all "Names" that have duplicate MD5 values and in which other "Names" these MD5 values were found.
So I want to get something like this:
Name1 has 5 duplicate entries in "md5s" with Name8, 4 with Name10 ...
I need a list for all "Names" like described above.
Hopefully that makes sense to someone. :)
I already tried it with this SQL stat开发者_如何学JAVAement:
SELECT names,COUNT(names) AS Num FROM tablename GROUP BY names HAVING(Num > 1);
But that gives me only the md5s that are duplicates. The relation to the rest is totally missing.
*edit:fixed typo
I feel like there must be a better solution than this, but here's what I've thrown together for you:
SELECT a.names NAME,
b.names DUPE_NAME,
COUNT(*) NUM_DUPES
FROM names_tbl a, names_tbl b, md5_tbl md5a, md5_tbl md5b
WHERE a.id < b.id
AND a.id = md5a.id
AND b.id = md5b.id
AND md5a.md5 = md5b.md5
GROUP BY a.names, b.names
ORDER BY a.names
The rule of thumb with finding duplicates is that you probably need to do a self join. This would be simpler if the names and their associated md5's were in the same record, but because they're in separate tables I think you need two versions of each table.
精彩评论