开发者

Get top first record from duplicate records having no unique identity

I need to fetch top first row out of each duplicate set of records from table given below. I need to use this query in view

please no temp table as I have already done it by adding identity column and min function and group by. I need solution without temp table or table variable

This is just sample data. Ori开发者_如何学Pythonginal has 1000s of records in table and I need just result from top 1000, so I can't use distinct

I am using SQL Server 2005

Get top first record from duplicate records having no unique identity


Find all products that has been ordered 1 or more times... (kind of duplicate records)

SELECT DISTINCT * from [order_items] where productid in 
(SELECT productid 
  FROM [order_items]
  group by productid 
  having COUNT(*)>0)
order by productid 

To select the last inserted of those...

SELECT DISTINCT productid, MAX(id) OVER (PARTITION BY productid) AS LastRowId from [order_items] where productid in 
(SELECT productid 
  FROM [order_items]
  group by productid 
  having COUNT(*)>0)
order by productid 


The answer depends on specifically what you mean by the "top 1000 distinct" records.

If you mean that you want to return at most 1000 distinct records, regardless of how many duplicates are in the table, then write this:

SELECT DISTINCT TOP 1000 id, uname, tel
FROM Users
ORDER BY <sort_columns>

If you only want to search the first 1000 rows in the table, and potentially return much fewer than 1000 distinct rows, then you would write it with a subquery or CTE, like this:

SELECT DISTINCT *
FROM
(
    SELECT TOP 1000 id, uname, tel
    FROM Users
    ORDER BY <sort_columns>
) u

The ORDER BY is of course optional if you don't care about which records you return.


Sometimes you can use the CROSS APPLY operator like this:

select distinct result.* from data d
cross apply (select top 1 * from data where data.Id = d.Id) result

In this query I need to pick only the first of many duplicates that naturally happen to occur in my data. It works on SQL Server 2005+ databases.


Using DISTINCT should do it:

SELECT DISTINCT id, uname, tel
FROM YourTable

Though you could really do with having a primary key on that table, a way to uniquely identify each record. I'd be considering sticking an IDENTITY column on the table


You can try the following:

  1. Create a view that simply selects all the columns from the original table but add an extra numeric column that increase in value with each record\row. You may need to make this column a non integer column (e.g a decimal and increment it by 1.00 for each record to use it in the RANK() SQL statement).

  2. Also add another column (e.g. 'RecordRank') to contain calculated ranked values for all columns using the RANK() OVER SQL clause to create values for this column - see references below. The RANK statement allows you to partition the records and then order each partition records according to the values in the order by column (use the Column with increasing values from step 1 for your order by). You use the columns with identical data in the partition clause so all those similar duplicates are partitioned or grouped together, and then ordered by the values in the extra column (order by column from step1).

    http://msdn.microsoft.com/en-us/library/ms189461.aspx

3, After successfully creating the above view, just write another view to select only records with 'RecordRank' = 1

This should select only one of each record from the duplicates or partitions.

Hope this helps - malcom sankoh


Doesn't SELECT DISTINCT help? I suppose it would return the result you want.


YOur best bet is to fix the datbase design and add the identioty column to the table. Why do you havea table without one in the first place? Especially one with duplicate records! Clearly the database itself needs redesigning.

And why do you have to have this in a view, why isn't your solution with the temp table a valid solution? Views are not usually a really good thing to do to a perfectly nice database.


Here are two solutions, I am using Oracle:

  1. using over clause:
        with org_table as
     (select 1 id, 'Ali' uname
        from dual
      union
      select 1, 'June'
        from dual
      union
      select 2, 'Jame'
        from dual
      union
      select 2, 'July' from dual)
    select id, uname
      from (select a.id,
                   a.uname,
                   ROW_NUMBER() OVER(PARTITION BY a.id ORDER BY a.id) AS freq
            
              from org_table a)
     where freq = 1
  1. Using sub-query:
        with org_table as
     (select 1 id, 'Ali' uname
        from dual
      union
      select 1, 'June'
        from dual
      union
      select 2, 'Jame'
        from dual
      union
      select 2, 'July' from dual)
    
    select a.id,
           (select b.uname
              from org_table b
             where b.id = a.id
               and rownum = 1)
      from (select distinct id from org_table) a


SELECT TOP 1000 MAX(tel) FROM TableName WHERE Id IN 
(
SELECT Id FROM TableName
GROUP BY Id
HAVING COUNT(*) > 1
) 
GROUP BY Id
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜