开发者

SQL: Need to remove duplicate rows in query containing multiple joins

Note that I'm a complete SQL noob and in the process of learning. Based on Google searches (including searching here) I've tried using SELECT DISTINCT and GROUP BY but neither works, likely due to all of my joins (if anyone knows why they won't work exactly, that would be helpful to learn).

I need data from a variety of tables and below is the only way I know to do it (I just know the basics). The query below works fine but shows duplicates. I need to know how to remove those. The only hint I have right now is perhaps a nested SELECT query but based on research I'm not sure how to implement them. Any help at all would be great, thanks!

USE SQL_Contest
go
SELECT
    CLT.Description AS ClockType,
    CLK.SerialNumber AS JobClockSerial,
    SIT.SiteNumber AS JobID,开发者_开发知识库
    SIT.[Name] AS JobsiteName,
    SIT.Status AS SiteActivityStatus,
    DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite
    CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite
    SES.ClockVoltage
FROM tb_Clock CLK
INNER JOIN tb_ClockType CLT
ON CLK.TypeID = CLT.ClockTypeID
INNER JOIN tb_DeviceHolderActivity DHA
ON CLK.ClockGUID = DHA.DeviceGUID
INNER JOIN tb_Site SIT
ON SIT.SiteGUID = DHA.HolderGUID
LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber
WHERE DHA.ReturnedDate IS NULL
ORDER BY SIT.[Name] ASC

EDIT: I will be reviewing these answers shortly, thank you very much. I'm posting the additional duplicate info per Rob's request:

Everything displays fine until I add:

LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber

Which I need. That's when a duplicate appears:

JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049    
JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049

I want that info to only display once. Essentially this query is to determine all active jobsites that have a clock assigned to them, and that job only has one clock assigned to it, and it's only one jobsite, but it's appearing twice.

EDIT 2: Based on the help you guys provided I was able to determine they actually are NOT duplicates, and each session is independent, that is the only one that happened to have two sessions. So now I'm going to try to figure out how to only pull in information from the latest session.


If everything "works fine" until you add:

LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber

Then there must be more than one record in tb_Session for each CLK.SerialNumber.

Run the following query:

SELECT  *
FROM    tb_Session SES
WHERE   ClockSerialNumber = '08-107'

There should be two records returned. You need to decide how to handle this (i.e. Which record do you want to use?), unless both rows from tb_Session contain identical data, in which case, should they?

You could always change your query to:

SELECT
    CLT.Description AS ClockType,
    CLK.SerialNumber AS JobClockSerial,
    SIT.SiteNumber AS JobID,
    SIT.[Name] AS JobsiteName,
    SIT.Status AS SiteActivityStatus,
    DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite
    CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite
    SES.ClockVoltage
FROM tb_Clock CLK
INNER JOIN tb_ClockType CLT
ON CLK.TypeID = CLT.ClockTypeID
INNER JOIN tb_DeviceHolderActivity DHA
ON CLK.ClockGUID = DHA.DeviceGUID
INNER JOIN tb_Site SIT
ON SIT.SiteGUID = DHA.HolderGUID
LEFT JOIN 
(
    SELECT DISTINCT ClockSerialNumber, ClockVoltage
    FROM tb_Session 
) SES
ON SES.ClockSerialNumber = CLK.SerialNumber
WHERE DHA.ReturnedDate IS NULL
ORDER BY SIT.[Name] ASC

As that should ensure that SES only contains one record for each unique combination of ClockSerialNumber and ClockVoltage


Take this example dataset:

Ingredient

IngredientId IngredientName
============ =========
1            Apple
2            Orange
3            Pear
4            Tomato

Recipe

RecipeId RecipeName
======== ==========
1        Apple Turnover
2        Apple Pie
3        Poached Pears

Recipe_Ingredient

RecipeId IngredientId Quantity
======== ============ ========
1        1            0.25
1        1            1.00
2        1            2.00
3        3            1.00

Note: Why the Apple Turnover has two lots of apple as ingredients, is neither here nor there, it just does.

The following query will return two rows for the "Apple Turnover" recipe, one row for the "Apple Pie" recipe and one row for the "Poached Pears" recipe, because there are two entries in the Recipe_Ingredient table for IngredientId 1. That's just what happens with a join..

SELECT  I.IngredientName,
        R.RecipeName
FROM    Ingredient I
JOIN    Recipe_Ingredient RI
        ON I.IngredientId = RI.IngredientId
JOIN    Recipe R
        ON RI.recipeId = R.RecipeId

You could get this to return only one row by changing it to:

SELECT  I.IngredientName,
        R.RecipeName
FROM    Ingredient I
JOIN    Recipe_Ingredient RI
        ON I.IngredientId = RI.IngredientId
JOIN    Recipe R
        ON RI.recipeId = R.RecipeId
GROUP BY I.IngredientName, R.RecipeName

Without more specifics regarding your data, it's hard to apply this to your specific scenario, but the walkthrough may help you understand where the "duplicates" are coming from as someone unfamiliar with SQL


The joins are not your problem. From your comments I will infer that what you are calling "duplicates" are not actual duplicates. If all columns values for 2 "duplicates" returned from the query matched, then either SELECT DISTINCT or GROUP BY would definitely eliminate them. So you should be able to find a solution by looking at your column definitions.

My best guess is that you're getting duplicates of for the same date which aren't really duplicates because the time component of the date doesn't match. To eliminate this problem, you can truncate the date fields to the date only using this technique:

    DATEADD(DAY, DATEDIFF(DAY, 0, DHA.IssuedDate), 0) AS DHAIssuedDate,
    DATEADD(DAY, DATEDIFF(DAY, 0, CLK.CreatedDate), 0) AS CLKCreatedDate,   

If that doesn't work you might want to take a look at JobClockSerial: does this column belong in the query results?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜