Removing duplicates from SQL Join

2023-03-28 18:26 问答作者：

The following is a hypothetical situation this which is close to my real problem. Table1

recid   firstname    lastname   company
1       A             B          AAA
2       D             E          DEF
3       G             H          IJK
4       A        开发者_如何学C     B          ABC

I have a table2 that looks like this

recid   firstname    lastname   company
10      A             B          ABC
20      D             E          DEF
30      M             D          DIM
40      A             B          CCC

Now if I join the table on recid, it will give 0 results, there will be no duplicates because recid is unique. But if I join on firstname and lastname columns, which are not unique and there are duplicates, I get duplicates on inner join. The more columns I add on join, the worse it becomes (more duplicates are created).

In the above simple situation, how can I remove duplicates in the following query? I want to compare firstname and lastname, if they match, I return firstname, lastname and recid from table2

SELECT DISTINCT * FROM
(SELECT recid, first, last FROM table1) a
INNER JOIN
(SELECT recid, first,last FROM table2) b
ON a.first = b.first

The script is here if anyone wants to play with it in future

CREATE TABLE table1 (recid INT NOT NULL PRIMARY KEY, first varchar(20), last varchar(20), company varchar(20))
CREATE TABLE table2 (recid INT NOT NULL PRIMARY KEY, first varchar(20), last varchar(20), company varchar(20))

INSERT INTO TABLE1 VALUES(1,'A','B','ABC')
INSERT INTO TABLE1 VALUES(2,'D','E','DEF')
INSERT INTO TABLE1 VALUES(3,'M','N','MNO')
INSERT INTO TABLE1 VALUES(4,'A','B','ABC')

INSERT INTO TABLE2 VALUES(10,'A','B','ABC')
INSERT INTO TABLE2 VALUES(20,'D','E','DEF')
INSERT INTO TABLE2 VALUES(30,'Q','R','QRS')
INSERT INTO TABLE2 VALUES(40,'A','B','ABC')

You don't want to do a join per se, you're merely testing for existence/set inclusion.

I don't know what current flavor of SQL you're coding in, but this should work.

SELECT MAX(recid), firstname, lastname 
FROM table2 T2
WHERE EXISTS (SELECT * FROM table1 WHERE firstname = T2.firstame AND lastname = T2.lastname)
GROUP BY lastname, firstname

If you want to implement as a join, leaving the code largely the same:

i.e.

SELECT max(t2.recid), t2.firstame, t2.lastname 
FROM Table2 T2 
INNER JOIN Table1 T1 
    ON T2.firstname = t1.firstname and t2.lastname = t1.lastname
GROUP BY t2.firstname, t2.lastname

Depending on the DBMS, an inner join may be implemented differently to an Exists (semi-join vs join) but the optimizer can sometimes figure it out anyway and chose the correct operator regardless of which way you write it.

SELECT t2.recid, t2.first, t2.last 
FROM  table1 t1
INNER JOIN table2 t2 ON t1.first = t2.first AND t1.last = t2.last
GROUP BY t2.recid, t2.first, t2.last

EDIT: Added picture

Removing duplicates from SQL Join

There are some situations when you have to do grouping in a subquery

SELECT distinct b.recid, b.first, b.last 
FROM table1 a
INNER JOIN (
  SELECT MAX(recid) as recid, first, last 
  FROM table2
  GROUP BY first, last
) b
ON a.first = b.first AND a.last = b.last

继续阅读：join sql

Removing duplicates from SQL Join

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？