开发者

Find and group duplicates

Hopefully I'm able to explain what I'm trying to achieve, it's a bit complicated I think.

I have two tables like this:

ID   | Names
--------------
A    | Name1
B    | Name2
C    | Name3

ID   | md5s
--------------
A    | a
A    | b
B    | c
C    | a
C    | c

I'm trying to achieve this: In the end I want to have a list of all "Names" that have duplicate MD5 values and in which other "Names" these MD5 values were found.

So I want to get something like this:

Name1 has 5 duplicate entries in "md5s" with Name8, 4 with Name10 ...

I need a list for all "Names" like described above.

Hopefully that makes sense to someone. :)

I already tried it with this SQL stat开发者_如何学JAVAement:

SELECT names,COUNT(names) AS Num FROM tablename GROUP BY names HAVING(Num > 1);

But that gives me only the md5s that are duplicates. The relation to the rest is totally missing.

*edit:fixed typo


I feel like there must be a better solution than this, but here's what I've thrown together for you:

SELECT  a.names NAME,
        b.names DUPE_NAME,
        COUNT(*) NUM_DUPES
FROM    names_tbl a, names_tbl b, md5_tbl md5a, md5_tbl md5b
WHERE   a.id < b.id
AND     a.id = md5a.id
AND     b.id = md5b.id
AND     md5a.md5 = md5b.md5
GROUP BY a.names, b.names
ORDER BY a.names

The rule of thumb with finding duplicates is that you probably need to do a self join. This would be simpler if the names and their associated md5's were in the same record, but because they're in separate tables I think you need two versions of each table.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜