开发者

Group by and non distinct columns and data normalization

I have a large table(60 columns, 1.5 million records) of denormalized data in MS SQL 2005 that was imported from an Access database. I've been tasked with normalizing and inserting this data into our data model.

I would like to create a query that used a grouping of, for example "customer_number", and returned a result set that only contains the columns that are non distinct for each customer_number. I don't know if it's even possible, but it would be of gre开发者_如何学Goat help if it was.

Edit:if my table has 3 columns(cust_num,cust_name_cust_address) and 5 records

|cust_num|cust_name|cust_address
|01      |abc      |12 1st street
|02      |cbs      |1 Aroundthe Way
|01      |abc      |MLK BLVD
|03      |DMC      |Hollis Queens
|02      |cbs      |1 Aroundthe Way

the results from my desired query should just be the data from cust_num and cust_name because cust_address has different values for that grouping of cust_num. A cust_num has many addresses but only one cust_name.

Can someone point me in the right direction?

Jim


WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, … ORDER BY id) AS rn
        FROM    mytable
        )
SELECT  *
FROM    q
WHERE   rn = 2

This will select only the rows that have duplicates on all columns listed in the PARTITION BY clause (in fact, it will select the first duplicate if any).


No, this can't be done

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜