开发者

What is the T-SQL to normalize an existing table?

I'd like to convert an existing table into the 1st normal form (simplest normalization possible; see the example).

Do you happen to know what is the T-SQL is for this sort of problem? Many thanks!

What is the T-SQL to normalize an existing table?

Update

Tried the answer below, it worked perfect开发者_StackOverflow中文版ly. Here is the steps I used to test the answer:

  1. Start up Microsoft SQL Management Studio.
  2. Create the tables, with the data below.
  3. Make sure that the ID in "Customer" is set to "Primary Key", and "identity".
  4. Make sure that the ID in "Order" has no special settings (its a foreign key).
  5. Open up a database diagram, then create a 1:* relationship between the "Customer" and "Order" tables.
  6. Execute the script on the "Customer" table and the "Order" table, it will automatically normalize the data properly for you.
  7. This is very useful if you are starting from a flat .csv file that has just been imported, and you want to copy the information into a normalized form in the database.


In the case above @Thomas has a perfectly workable solution. However, sometimes people simplify for the sake of asking the question, so I'll address what you might want to do if you need to to go to many tables (or the first table has no unique constraint on name for instance), not just two.

First, I will insert the data to a staging table and add a column for the id which will be null. Then I would write an insert to the parent table using the OUTPUT clause to output the ids and the natural key to a table variable. Then I would use the table variable to update the id field in the staging table. Then I would insert records from the staging table to the other tables. Since I now have the id there, it is no longer necessary to access the orginal parent table. (If the number of records are large, I might also index the staging table).

Now if you have no natural key, the process becomes harder because you have no way to identify which record goes to who. Then I usually add an identity to the staging table and then do the intial import to the parent table one record at a time (including the stagingtableid as a variable in the cursor) and then update the staging table with each Parent table id as it is created. Once all the intial records are updated, I use set-based processes to insert or update to the other tables.

The staging table also gives you the chance to fix any bad data locally before trying to put it into your production tables.

Other syntax you may need to know if things are complex or if this is a repeated process is the MERGE statment. This will insert if it is a new record and update if it is an existing record.

If this is a very complex transformation, you may consider using SSIS.


Starting with the Customer table

INSERT INTO Customer (Name)
SELECT DISTINCT Name
FROM Flat_CSV_File

If you have repeated imports

INSERT INTO Customer (Name)
SELECT DISTINCT f.Name
FROM Flat_CSV_File f
LEFT OUTER JOIN Customer c ON f.Name = c.Name
WHERE c.Id IS NULL

Orders (your table name Order is a reserved word in TSQL, so you need to quote it with square brackets)

INSERT INTO [Order] (CustomerId, Description, Cost)
SELECT c.Id, f.Description, f.Cost
FROM Flat_CSV_File f
INNER JOIN Customer c ON f.Name = c.Name


The simplest solution is to just write a query to do the import:

-- assuming that Id is an Identity column or has some default to generate keys.
Insert Customer( [Name] )
Select Name
From Flat_csv_file
Group By Name

Insert Order( [Customer], [Description], Cost )
Select C.Id, F.Description F.Cost
From Customer As C
    Join Flat_csv_file As F
        On F.Name = C.Name


If this is a one-off process, I would manipulate the .csv first, and form your tables there including the primary keys. When you populate the SQL database, use

SET IDENTITY_INSERT Customers ON

INSERT Customers
(
   ...
)
SELECT
   ...
FROM
   openrowset(...)

SET IDENTITY_INSERT Customers ON



SET IDENTITY_INSERT Orders ON

INSERT Orders
(
   ...
)
SELECT
   ...
FROM
   openrowset(...)

SET IDENTITY_INSERT Orders ON

If you need a purely TSQL solution, I would create temp tables to do the same basic thing, and insert from the temp tables instead of the .csv.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜