How to find duplicate values in SQL Server

2022-12-30 19:17 问答作者：

I'm using SQL Server 2008. I have a table

Customers

customer_number int

field1 varchar

field2 varchar

field3 varchar

field4 varchar

... and a lot more columns, that don't matter for my queries.

Column customer_number is pk. I'm trying to find duplicate values and some differences between them.

Please, help me to fi开发者_StackOverflow中文版nd all rows that have same

1) field1, field2, field3, field4

2) only 3 columns are equal and one of them isn't (except rows from list 1)

3) only 2 columns equal and two of them aren't (except rows from list 1 and list 2)

In the end, I'll have 3 tables with this results and additional groupId, which will be same for a group of similar (For example, for 3 column equals, rows that have 3 same columns equal will be a separate group)

Thank you.

Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:

SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

You could also use this technique to find rows that occur exactly once:

SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )

The easiest would probably be to write a stored procedure to iterate over each group of customers with duplicates and insert the matching ones per group number respectively.

However, I've thought about it and you can probably do this with a subquery. Hopefully I haven't made it more complicated than it ought to, but this should get you what you're looking for for the first table of duplicates (all four fields). Note that this is untested, so it might need a little tweaking.

Basically, it gets each group of fields where there are duplicates, a group number for each, then gets all customers with those fields and assigns the same group number.

INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
             c.field1, c.field2, c.field3, c.field4
      FROM Customers c
      GROUP BY c.field1, c.field2, c.field3, c.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
                           AND custs.field2 = Groups.field2
                           AND custs.field3 = Groups.field3
                           AND custs.field4 = Groups.field4

The other ones are a bit more complicated, however as you'll need to expand out the possibilities. The three-field groups would then be:

INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
             GroupsInner.field1, GroupsInner.field2, 
             GroupsInner.field3, GroupsInner.field4
      FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                       FROM FourFieldsDuplicates d
                       WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field3
            UNION ALL
            SELECT c.field1, c.field2, NULL AS field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field4
            UNION ALL
            SELECT c.field1, NULL AS field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field3, c.field4
            UNION ALL
            SELECT NULL AS field1, c.field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field2, c.field3, c.field4) GroupsInner
      GROUP BY GroupsInner.field1, GroupsInner.field2, 
               GroupsInner.field3, GroupsInner.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
                           AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
                           AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
                           AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)

Hopefully this produces the right results and I'll leave the last one as an exercise. :-D

I'm not sure if you require an equality check on different fields (like field1=field2).
Otherwise this might be enough.

Edit

Feel free to adjust the testdata to provide us with inputs that give a wrong output according to your specifications.

Test data

DECLARE @Customers TABLE (
  customer_number INTEGER IDENTITY(1, 1)
  , field1 INTEGER
  , field2 INTEGER
  , field3 INTEGER
  , field4 INTEGER)

INSERT INTO @Customers
          SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, NULL
UNION ALL SELECT 1, 1, 1, 2
UNION ALL SELECT 1, 1, 1, 3
UNION ALL SELECT 2, 1, 1, 1

All Equal

SELECT  ROW_NUMBER() OVER (ORDER BY c1.customer_number)
        , c1.field1
        , c1.field2
        , c1.field3
        , c1.field4
FROM    @Customers c1 
        INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                    AND ISNULL(c2.field1, 0) = ISNULL(c1.field1, 0) 
                                    AND ISNULL(c2.field2, 0) = ISNULL(c1.field2, 0)
                                    AND ISNULL(c2.field3, 0) = ISNULL(c1.field3, 0)
                                    AND ISNULL(c2.field4, 0) = ISNULL(c1.field4, 0)

One field different

SELECT  ROW_NUMBER() OVER (ORDER BY field1, field2, field3, field4)
        , field1
        , field2
        , field3
        , field4
FROM    (
          SELECT  DISTINCT c1.field1
                  , c1.field2
                  , c1.field3
                  , field4 = NULL
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND c2.field1 = c1.field1 
                                             AND c2.field2 = c1.field2 
                                             AND c2.field3 = c1.field3 
                                             AND ISNULL(c2.field4, 0) <> ISNULL(c1.field4, 0) 
          UNION ALL
          SELECT  DISTINCT c1.field1
                  , c1.field2
                  , NULL
                  , c1.field4
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND c2.field1 = c1.field1 
                                             AND c2.field2 = c1.field2 
                                             AND ISNULL(c2.field3, 0) <> ISNULL(c1.field3, 0) 
                                             AND c2.field4 = c1.field4 
          UNION ALL
          SELECT  DISTINCT c1.field1
                  , NULL
                  , c1.field3
                  , c1.field4
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND c2.field1 = c1.field1 
                                             AND ISNULL(c2.field2, 0) <> ISNULL(c1.field2, 0) 
                                             AND c2.field3 = c1.field3 
                                             AND c2.field4 = c1.field4 
          UNION ALL
          SELECT  DISTINCT NULL
                  , c1.field2
                  , c1.field3
                  , c1.field4
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND ISNULL(c2.field1, 0) <> ISNULL(c1.field1, 0)
                                             AND c2.field2 = c1.field2 
                                             AND c2.field3 = c1.field3 
                                             AND c2.field4 = c1.field4 
      ) c

You can write simply something like that to count duplicates entries, i think it's working :

use *DATABASE_NAME*
go
SELECT     *YOUR_FIELD*, COUNT(*) AS dupes  
FROM         *YOUR_TABLE_NAME*
GROUP BY *YOUR_FIELD* 
HAVING      (COUNT(*) > 1)

Enjoy

There is a clean way of doing this with CUBE(), which will aggregate by all the possible combinations of columns

SELECT
  field1,field2,field3,field4
 ,duplicate_row_count = COUNT(*)
 ,grp_id = GROUPING_ID(field1,field2,field3,field4)
INTO #duplicate_rows
FROM table_name
GROUP BY CUBE(field1,field2,field3,field4)
HAVING COUNT(*) > 1
  AND GROUPING_ID(field1,field2,field3,field4) IN (0,1,2,4,8,3,5,6,9,10,12)

The numbers (0,1,2,4,8,3,5,6,9,10,12) are just the bitmasks (0000,0001,0010,0100,...,1010,1100) of the grouping sets that we care about-- those with 4, 3, or 2 matches.

Then join this back to the original table using a technique that treats NULLs in #duplicate_rows as wildcards

SELECT a.*
FROM table_name a
INNER JOIN #duplicate_rows b
  ON  NULLIF(b.field1,a.field1) IS NULL
  AND NULLIF(b.field2,a.field2) IS NULL
  AND NULLIF(b.field3,a.field3) IS NULL
  AND NULLIF(b.field4,a.field4) IS NULL
--WHERE grp_id IN (0)             --Use this for 4 matches
--WHERE grp_id IN (1,2,4,8)       --Use this for 3 matches
--WHERE grp_id IN (3,5,6,9,10,12) --Use this for 2 matches

继续阅读：duplicates sql-server sql-server-2008

How to find duplicate values in SQL Server

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？