Efficiently bulk import data in SSIS with occasional PK duplicate content?

2023-02-06 00:19 问答作者：

Am regularly loading a flat file with 100k records into a table after some transformations. The table has a PK on two columns. The data on the whole does not contain duplicate PK information but occasionally, there are duplicates.

I naively didn't understand why SSIS was rejecting all my records when only some of them violated the PK constraint. I believe the problem is that during a bulk load, if even 1 of the rows violates the PK constraint, all rows in that batch get rejected.

If I alter the FastLoadMaxInsertCommitSize property of the OLE Db Destination to 1, if fixes开发者_JAVA百科 the problem but it then runs like a dog as it's committing every 1 row.

In MySQL, the bulk load facility allows you to ignore PK errors and skip those rows without sacrificing performance. Does anyone know of a way to achieve this in SQL Server.

Any help much appreciated.

It sounds like you may be looking for IGNORE_DUP_KEY?

Using the IGNORE_DUP_KEY Option to Handle Duplicate Values

When you create or modify a unique index or constraint, you can set the IGNORE_DUP_KEY option ON or OFF. This option specifies the error response to duplicate key values in a multiple-row INSERT statement after the index has been created. When IGNORE_DUP_KEY is set to OFF (the default), the SQL Server Database Engine rejects all rows in the statement when one or more rows contain duplicate key values. When set to ON, only the rows that contain duplicate key values are rejected; the nonduplicate key values are added.

For example, if a single statement inserts 20 rows into a table with a unique index, and 10 of those rows contain duplicate key values, by default all 20 rows are rejected. However, if the index option IGNORE_DUP_KEY is set to ON, only the 10 duplicate key values will be rejected; the other 10 nonduplicate key values will be inserted into the table.

You can up the FastLoadMaxInsertCommitSize to say 5k...this will speed up your inserts greatly. Then, set the Error Output to redirect the rows - on the error output from there, send a batch of 5k rows that contains an error row to another Destination. (This next bit is from memory!) If you set this up to not be fast load, it will then insert the good rows and you can pass the error output to an error table or something like a row count task.

You can play with the FastLoadMaxInsertCommitSize figures until you find something that works well for you.

继续阅读：flat-file sql-server-2008 ssis

Efficiently bulk import data in SSIS with occasional PK duplicate content?

Using the IGNORE_DUP_KEY Option to Handle Duplicate Values

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

Using the IGNORE_DUP_KEY Option to Handle Duplicate Values

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？