开发者

Optimizing massive insert performance...?

Given: SQL Server 2008 R2. Quit some speedin data discs. Log discs lagging.

Required: LOTS LOTS LOTS of inserts. Like 10.000 to 30.000 rows into a simple table with two indices per second. Inserts have an intrinsic order and will not repeat, as such order of inserts must not be maintained in short term (i.e. multiple parallel inserts are ok).

So far: accumulating data into a queue. Regularly (async threadpool) emptying up to 1024 entries into a work item that gets queued. Threadpool (custom class) has 32 possible threads. Opens 32 connections.

Problem: performance is off by a factor of 300.... only about 100 to 150 rows are inserted per second. Log wait time is up to 40% - 45% of processing time (ms per second) in sql ser开发者_Python百科ver. Server cpu load is low (4% to 5% or so).

Not usable: bulk insert. The data must be written as real time as possible to the disc. THis is pretty much an archivl process of data running through the system, but there are queries which need access to the data regularly. I could try dumping them to disc and using bulk upload 1-2 times per second.... will give this a try.

Anyone a smart idea? My next step is moving the log to a fast disc set (128gb modern ssd) and to see what happens then. The significant performance boost probably will do things quite different. But even then.... the question is whether / what is feasible.

So, please fire on the smart ideas.


Ok, anywering myself. Going to give SqlBulkCopy a try, batching up to 65536 entries and flushing them out every second in an async fashion. Will report on the gains.


I'm going through the exact same issue here, so I'll go through the steps i'm taking to improve my performance.

  • Separate the log and the dbf file onto different spindle sets
  • Use basic recovery
  • you didn't mention any indexing requirements other than the fact that the order of inserts isn't important - in this case clustered indexes on anything other than an identity column shouldn't be used.
  • start your scaling of concurrency again from 1 and stop when your performance flattens out; anything over this will likely hurt performance.
  • rather than dropping to disk to bcp, and as you are using SQL Server 2008, consider inserting multiple rows at a time; this statement inserts three rows in a single sql call

    INSERT INTO table VALUES ( 1,2,3 ), ( 4,5,6 ), ( 7,8,9 )

I was topping out at ~500 distinct inserts per second from a single thread. After ruling out the network and CPU (0 on both client and server), I assumed that disk io on the server was to blame, however inserting in batches of three got me 1500 inserts per second which rules out disk io.

It's clear that the MS client library has an upper limit baked into it (and a dive into reflector shows some hairy async completion code).

Batching in this way, waiting for x events to be received before calling insert, has me now inserting at ~2700 inserts per second from a single thread which appears to be the upper limit for my configuration.

Note: if you don't have a constant stream of events arriving at all times, you might consider adding a timer that flushes your inserts after a certain period (so that you see the last event of the day!)


Some suggestions for increasing insert performance:

  • Increase ADO.NET BatchSize
  • Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits (e.g. autoinc column)
  • Insert into a temporary heap table first, then issue one big "insert-by-select" statement to push all that staging table data into the actual target table
  • Apply SqlBulkCopy
  • Choose "Bulk Logged" recovery model instad of "Full" recovery model
  • Place a table lock before inserting (if your business scenario allows for it)

Taken from Tips For Lightning-Fast Insert Performance On SqlServer

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜