Find and group duplicates

2023-03-18 18:50 问答作者：

Hopefully I'm able to explain what I'm trying to achieve, it's a bit complicated I think.

I have two tables like this:

ID   | Names
--------------
A    | Name1
B    | Name2
C    | Name3

ID   | md5s
--------------
A    | a
A    | b
B    | c
C    | a
C    | c

I'm trying to achieve this: In the end I want to have a list of all "Names" that have duplicate MD5 values and in which other "Names" these MD5 values were found.

So I want to get something like this:

Name1 has 5 duplicate entries in "md5s" with Name8, 4 with Name10 ...

I need a list for all "Names" like described above.

Hopefully that makes sense to someone. :)

I already tried it with this SQL stat开发者_如何学JAVAement:

SELECT names,COUNT(names) AS Num FROM tablename GROUP BY names HAVING(Num > 1);

But that gives me only the md5s that are duplicates. The relation to the rest is totally missing.

*edit:fixed typo

I feel like there must be a better solution than this, but here's what I've thrown together for you:

SELECT  a.names NAME,
        b.names DUPE_NAME,
        COUNT(*) NUM_DUPES
FROM    names_tbl a, names_tbl b, md5_tbl md5a, md5_tbl md5b
WHERE   a.id < b.id
AND     a.id = md5a.id
AND     b.id = md5b.id
AND     md5a.md5 = md5b.md5
GROUP BY a.names, b.names
ORDER BY a.names

The rule of thumb with finding duplicates is that you probably need to do a self join. This would be simpler if the names and their associated md5's were in the same record, but because they're in separate tables I think you need two versions of each table.

继续阅读：sql

Find and group duplicates

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？