What is the T-SQL to normalize an existing table?
I'd like to convert an existing table into the 1st normal form (simplest normalization possible; see the example).
Do you happen to know what is the T-SQL is for this sort of problem? Many thanks!
Update
Tried the answer below, it worked perfect开发者_StackOverflow中文版ly. Here is the steps I used to test the answer:
- Start up Microsoft SQL Management Studio.
- Create the tables, with the data below.
- Make sure that the ID in "Customer" is set to "Primary Key", and "identity".
- Make sure that the ID in "Order" has no special settings (its a foreign key).
- Open up a database diagram, then create a 1:* relationship between the "Customer" and "Order" tables.
- Execute the script on the "Customer" table and the "Order" table, it will automatically normalize the data properly for you.
- This is very useful if you are starting from a flat .csv file that has just been imported, and you want to copy the information into a normalized form in the database.
In the case above @Thomas has a perfectly workable solution. However, sometimes people simplify for the sake of asking the question, so I'll address what you might want to do if you need to to go to many tables (or the first table has no unique constraint on name for instance), not just two.
First, I will insert the data to a staging table and add a column for the id which will be null. Then I would write an insert to the parent table using the OUTPUT clause to output the ids and the natural key to a table variable. Then I would use the table variable to update the id field in the staging table. Then I would insert records from the staging table to the other tables. Since I now have the id there, it is no longer necessary to access the orginal parent table. (If the number of records are large, I might also index the staging table).
Now if you have no natural key, the process becomes harder because you have no way to identify which record goes to who. Then I usually add an identity to the staging table and then do the intial import to the parent table one record at a time (including the stagingtableid as a variable in the cursor) and then update the staging table with each Parent table id as it is created. Once all the intial records are updated, I use set-based processes to insert or update to the other tables.
The staging table also gives you the chance to fix any bad data locally before trying to put it into your production tables.
Other syntax you may need to know if things are complex or if this is a repeated process is the MERGE statment. This will insert if it is a new record and update if it is an existing record.
If this is a very complex transformation, you may consider using SSIS.
Starting with the Customer table
INSERT INTO Customer (Name)
SELECT DISTINCT Name
FROM Flat_CSV_File
If you have repeated imports
INSERT INTO Customer (Name)
SELECT DISTINCT f.Name
FROM Flat_CSV_File f
LEFT OUTER JOIN Customer c ON f.Name = c.Name
WHERE c.Id IS NULL
Orders (your table name Order is a reserved word in TSQL, so you need to quote it with square brackets)
INSERT INTO [Order] (CustomerId, Description, Cost)
SELECT c.Id, f.Description, f.Cost
FROM Flat_CSV_File f
INNER JOIN Customer c ON f.Name = c.Name
The simplest solution is to just write a query to do the import:
-- assuming that Id is an Identity column or has some default to generate keys.
Insert Customer( [Name] )
Select Name
From Flat_csv_file
Group By Name
Insert Order( [Customer], [Description], Cost )
Select C.Id, F.Description F.Cost
From Customer As C
Join Flat_csv_file As F
On F.Name = C.Name
If this is a one-off process, I would manipulate the .csv first, and form your tables there including the primary keys. When you populate the SQL database, use
SET IDENTITY_INSERT Customers ON
INSERT Customers
(
...
)
SELECT
...
FROM
openrowset(...)
SET IDENTITY_INSERT Customers ON
SET IDENTITY_INSERT Orders ON
INSERT Orders
(
...
)
SELECT
...
FROM
openrowset(...)
SET IDENTITY_INSERT Orders ON
If you need a purely TSQL solution, I would create temp tables to do the same basic thing, and insert from the temp tables instead of the .csv.
精彩评论