Bulk Insert into a HEAP vs CLUSTERED index where minimal logging is not an option (SQL Server 2008)

2023-03-31 02:20 问答作者：

The tool currently used is Informatica and we have bookend stored procedures that drop the clustered indexes and then add them back to the database. In the stored procedure where we add the clustered indexes back we have the DDL for the indexes hard coded into the stored procedure (we don't use sys tables because a fear of Microsoft changing the sys tables and regen from there creates a bad index or fails). This causes issues where people have created the clustered indexes but not thought to update the stored procedure and the next time bulk occurs these indexes are gone. We previously did this for all indexes but switched non clustered indexes to using disable/rebuild. This not an option though because we will no longer be able to insert into the table if this is done to the clustered index because it is essentially the table.

Performance is important but not everything. Good performance and easy maintainability trumps great performance and complex maintainability.

After reading many sites it is almost universally agreed that when performing bulk insert, on data not ordered the same as your primary key, inserting into a heap and then applying the pk afterwards is faster ( http://msdn.microsoft.com/en-us/library/ms177445.aspx , http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx). Most these sites make assumptions that开发者_开发技巧 I cannot use at my organization and with my toolset.

Currently due to our current standards policies we have to use FULL recovery model so minimally logging will not occur no matter which selection that I make in reference to heap vs clustered index.

According to our informatica admins specifying tablock or order hints on bcp is not possible through the UI and our organization is adverse to customization beyond the UI because of maintainability.

So the question after all of this is with all the factors above would you recommend that we continue with our somewhat unreliable stored procedures, insert into a clustered index or have some third far superior solution. I also realize there is other stack questions similar to this item but they do not address bulk specifically and/or make similar assumptions in their answers.

My suggestion would be to bulk load into a staging table (a heap, or CI matching the file order), (re-)build the clustered index there matching the destination table, and then insert straight from the staging table. To reduce blocking, escalation, log use etc. you could do this in batches of 10000 rows at a time, committing and/or checkpointing every so often.

You might also consider using a pre-processor (C# perhaps) that takes the log file and builds a new one with the proper sort order.

Also I think you are safer using sys.indexes etc. than hard-coding the index structures in the code. Microsoft is far less likely to change a column name in sys.indexes than someone at your shop (no offense intended) will change an index but forget to update the hard-coded definition in the procedure.

继续阅读：bulkinsert informatica-powercenter sql-server-2008

Bulk Insert into a HEAP vs CLUSTERED index where minimal logging is not an option (SQL Server 2008)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？