Group by and non distinct columns and data normalization
I have a large table(60 columns, 1.5 million records) of denormalized data in MS SQL 2005 that was imported from an Access database. I've been tasked with normalizing and inserting this data into our data model.
I would like to create a query that used a grouping of, for example "customer_number", and returned a result set that only contains the columns that are non distinct for each customer_number. I don't know if it's even possible, but it would be of gre开发者_如何学Goat help if it was.
Edit:if my table has 3 columns(cust_num,cust_name_cust_address) and 5 records
|cust_num|cust_name|cust_address
|01 |abc |12 1st street
|02 |cbs |1 Aroundthe Way
|01 |abc |MLK BLVD
|03 |DMC |Hollis Queens
|02 |cbs |1 Aroundthe Way
the results from my desired query should just be the data from cust_num and cust_name because cust_address has different values for that grouping of cust_num. A cust_num has many addresses but only one cust_name.
Can someone point me in the right direction?
Jim
WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, … ORDER BY id) AS rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 2
This will select only the rows that have duplicates on all columns listed in the PARTITION BY
clause (in fact, it will select the first duplicate if any).
No, this can't be done
精彩评论