开发者

Data Import Question: Should I use a cursor?

开发者_如何学Go

I'm currently working on a SQL import routine to import data from a legacy application to a more modern robust system. The routine simply imports data from a flat-file legacy table (stored as a .csv file) into SQL Server that follows the classic order/order-detail pattern. Here's what both tables look like:

**LEGACY_TABLE**  
Cust_No  
Item_1_No  
Item_1_Qty  
Item_1_Prc  
Item_2_No  
Item_2_Qty  
Item_2_Prc  
...  
Item_7_No    
Item_7_Qty  
Item_7_Prc  

As you can see, the legacy table is basically a 22 column spreadsheet that is used to represent a customer, along with up to 7 items and their quantity and purchase price, respectively.

The new table(s) look like this:

**INVOICE**  
Invoice_No  
Cust_No

**INVOICE_LINE_ITEM**  
Invoice_No  
Item_No  
Item_Qty  
Item_Prc  

My quick-and-dirty approach has been to create a replica of the LEGACY_TABLE (let's call it LEGACY_TABLE_SQL) in SQL Server. This table will be populated from the .csv file using a database import that is already built into the application.

From there, I created a stored procedure to actually copy each of the values in the LEGACY_TABLE_SQL table to the INVOICE/INVOICE_LINE_ITEM tables as well as handle the underlying logical constraints (i.e. performing existence tests, checking for already open invoices, etc.). Finally, I've created a database trigger that calls the stored procedure when new data is inserted into the LEGACY_TABLE_SQL table.

The stored procedure looks something like this:

CREATE PROC IMPORT_PROCEDURE 
@CUST_NO 
@ITEM_NO  
@ITEM_QTY  
@ITEM_PRC  

However, instead of calling the procedure once, I actually call the stored procedure seven times (once for each item) using a database trigger. I only execute the stored procedure when the ITEM_NO is NOT NULL, to account for blank items in the .csv file. Therefore, my trigger looks like this:

CREATE TRIGGER IMPORT_TRIGGER  
if ITEM_NO_1 IS NOT NULL  
begin  
exec IMPORT_PROCEDURE (CUST_NO,ITEM_NO_1, ITEM_QTY_1, ITEM_PRC_1)  
end  

...so on and so forth.

I'm not sure that this is the most efficient way to accomplish this task. Does anyone have any tips or insight that they wouldn't mind sharing?


I would separate the import process from any triggers. A trigger is useful if you're going to have rows being constantly added to the import table from a constantly running, outside source. It doesn't sound like this is your situation though, since you'll be importing an entire file at once. Triggers tend to hide code and can be difficult to work with in some situations.

How often are you importing these files?

I would have an import process that is mostly stand-alone. It might use stored procedures or tables in the database, but I wouldn't use triggers. A simple approach would be something like below. I've added a column to the Legacy_Invoices (also renamed to something that's more descriptive) so that you can track when items have been imported and from where. You can expand this to track more information if necessary.

Also, I don't see how you're tracking invoice numbers in your code. I've assumed an IDENTITY column in the Legacy_Invoices. This is almost certainly insufficient since I assume that you're creating invoices in your own system as well (outside of the legacy system). Without knowing your invoice numbering scheme though, it's impossible to give a solution there.

BEGIN TRAN

DECLARE
    @now DATETIME = GETDATE()

UPDATE Legacy_Invoices
SET
    import_datetime = @now
WHERE
    import_status = 'Awaiting Import'

INSERT INTO dbo.Invoices (invoice_no, cust_no)
SELECT DISTINCT invoice_no, cust_no
FROM
    Legacy_Invoices
WHERE
    import_datetime = @now

UPDATE Legacy_Invoices
SET
    import_status = 'Invoice Imported'
WHERE
    import_datetime = @now

INSERT INTO dbo.Invoice_Lines (invoice_no, item_no, item_qty, item_prc)
SELECT
    invoice_no,
    item_no_1,
    item_qty_1,
    item_prc_1
FROM
    Legacy_Invoices LI
WHERE
    import_datetime = @now AND
    import_status = 'Invoice Imported' AND
    item_no_1 IS NOT NULL

UPDATE Legacy_Invoices
SET
    import_status = 'Item 1 Imported'
WHERE
    import_datetime = @now AND
    import_status = 'Invoice Imported'

<Repeat for item_no_2 through 7>

COMMIT TRAN

Here's a big caveat though. While cursors are normally not desirable in SQL and you want to use set-based processing versus RBAR (row by agonizing row) processing, data imports are often an exception.

The problem with the above is that if one row fails, that whole import step fails. Also, it's very difficult to run a single entity (invoice plus line items) through business logic when you're importing them in bulk. This is one place where SSIS really shines. It's extremely fast (assuming that you set it up properly), even when importing one entity at a time. You can then put all sorts of error-handling in it to make sure that the import runs smoothly. One import row has an erroneous invoice number? No problem, mark it as an error and move on. A row has item# 2 filled in, but no item#1 or has a price without a quantity? No problem, mark the error and move on.

For a single import I might stick with the code above (adding in appropriate error handling of course), but for a repeating process I would almost certainly use SSIS. You can import millions of rows in seconds or minutes even with individual error handling on each business entity.

If you have any problems with getting SSIS running (there are tutorials all over the web and on MSDN at Microsoft) then post any problems here and you should get quick answers.


I'm not sure why you would add a trigger. Will you be continuing to use the LEGACY_TABLE_SQL? If not then how about this one time procedure? It uses Oracle syntax but can be adapted to most databases

PROCEDURE MIGRATE IS

CURSOR all_data is
SELECT invoice_no, cust_no,Item_1_no,Item_1_qty........
FROM LEGACY_TABLE_SQL;

BEGIN

FOR data in all_data LOOP INSERT INTO INVOICE (invoice_no, cust_no) VALUES (data.invoice_no, data.cust_no); IF Item_1_no IS NOT NULL THEN INSERT INTO INVOICE_LINE_ITEM(invoice_no,Item_1_no,Item_1_qty....) VALUES(data.invoice_no,data.Item_1_no,data.Item_1_qty....) END IF; --further inserts for each item

END LOOP;
COMMIT;
END;

This can be further optimized in Oracle with a BULK_COLLECT. I would create the INVOICE_LINE_ITEM table with default values of 0 for all items.

I would also consider these possibilities: is the invoice number really unique now and in the future? it may be a good idea to add a pseudo key based off a sequence
is there any importance to null item_no entries? Could this indicate a back order, short shipment or just bad data entry?

EDIT: as you advise that you will continue to use the legacy table you need to prioritize what you want. Is efficiency and performance your number one priority, maintainability, synchronous transaction For example:
- if performance is not really critical then implement this as you outlined - if this will have to be maintained then you might want to invest more into the coding - if you do not require a synchronous transaction then you could add a column to your LEGACY_TABLE_SQL called processed with a default value of 0. Then, once a day or hour, schedule a job to get all the orders that have not been processed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜