Why is SQL Query Returning Duplicates?
I have the following query. What is strange is that it is returning multiple records for the same individual - but it should be returning just one row for each individual. It is all LEFT JOINS based on CONTACT1 C - which has only one row for each individual, unlike the other columns which sometimes have multiple rows for the same individual.
select
C.ACCOUNTNO as 'AdmitGold Account',
C2.UNAMEFIRST as 'First Name',
C2.UNAMELAST as 'Last Name',
C.KEY1 as 'Status',
C.KEY4 as 'People ID',
C.KEY3 as 'Type',
C.KEY5 as 'Counselor',
C.CITY as 'City',
C.STATE as 'State',
C.SOURCE as 'Source',
C.DEPARTMENT as 'Major',
C2.UGENDER as 'Gender',
C2.UETHNICBG as 'Ethnicity',
C2.UFULLPART as 'Full/Part',
SLF_CLG_CS.EXT as 'College - GPA',
OFF_CLG_CS.EXT as 'College - GPA Official',
HS_OFF_CS.LINKACCT as 'HS GPA - Official',
OFF_SAT_COMP.LINKACCT as 'SAT - Verbal',
OFF_SAT_COMP.COUNTRY as 'SAT - Math',
(Cast(OFF_SAT_COMP.LINKACCT as float) + Cast(OFF_SAT_COMP.COUNTRY as float)) as 'SAT - Composite',
OFF_SAT_COMP.EXT as 'SAT - Essay',
OFF_ACT_COMP.LINKACCT as 'ACT - English',
OFF_ACT_COMP.COUNTRY as 'ACT - Math',
OFF_ACT_COMP.ZIP as 'ACT - Reading',
OFF_ACT_COMP.EXT as 'ACT - ScRe',
(Cast(OFF_ACT_COMP.LINKACCT as float) + Cast(OFF_ACT_COMP.COUNTRY as float)+ Cast(OFF_ACT_COMP.ZIP as float) + Cast(OFF_ACT_COMP.EXT as float)) as 'ACT - Official'
from contact1 C
left join CONTACT2 C2 on C.ACCOUNTNO=C2.ACCOUNTNO
left join CONTSUPP HS_OFF_CS on C.ACCOUNTNO=HS_OFF_CS.ACCOUNTNO
AND HS_OFF_CS.STATE='O' AND HS_OFF_CS.CONTACT='High School'
left join CONTSUPP SLF_CLG_CS on C.ACCOUNTNO=SLF_CLG_CS.ACCOUNTNO
AND SLF_CLG_CS.CONTACT = 'Transfer College' AND SLF_CLG_CS.STATE='S'
left join CONTSUPP OFF_CLG_CS on C.ACCOUNTNO=OFF_CLG_CS.ACCOUNTNO
AND OFF_CLG_CS.CONTACT = 'Transfer College' AND O开发者_运维百科FF_CLG_CS.STATE='O'
left join CONTSUPP OFF_SAT_COMP on C.ACCOUNTNO=OFF_SAT_COMP.ACCOUNTNO
AND OFF_SAT_COMP.CONTACT='Test/SAT' AND OFF_SAT_COMP.ZIP='O'
left join CONTSUPP OFF_ACT_COMP on C.ACCOUNTNO=OFF_ACT_COMP.ACCOUNTNO
AND OFF_ACT_COMP.CONTACT='Test/ACT' AND OFF_ACT_COMP.STATE='O'
where
C.KEY1!='00PRSP'
AND C.U_KEY2='2010 FALL'
A left join will produce duplicates in a 1-to-many relationship. Regardless of how many records are in your first table, if you left join to a table with multiple rows for each record in the first table you'll get more than one row. Select Distinct will remove duplicates if the rows are actually duplicated for all columns, but will not eliminate 'duplicates' that have a different value in any column.
A quick way of identifying where duplicates are coming from if you've SHOW PLAN rights on the server - add a WHERE clause (e.g. WHERE C.ACCOUNTNO='some value') that you would expect to bring back a single row (but where you've identified that the value actually brings back > 1 row), enable "Include Actual Execution Plan", run the query and hover over the links between the stages of the plan - at some point you'll find that > 1 record is emanating from a particular stage and looking at this stage's details can shed light on the cause of the duplication.
精彩评论