开发者

Writing code to Process 25,000 records C#, T-SQL, Quick Performance is key

What would be the most efficent way to Loop through 25开发者_如何转开发,000 records, and based on some prewritten vb logic that wont ever change(99% sure), update the Result column in a table to a value of 1, 2 or 3?

Performance and reliabilty is most important here. This most likely will get called via a client server app on the network but would be nice to be able to call it from a web app. I am thinking about 3 different ways to do it with T-SQL, C#.

a. Write an object that executes a stored procedure gets the 25,000 records, use the foreach collection to go through each record and based on some c# logic, call an object at each record that executes a stored procedure to update that row. This would call the object 25,000 times (and the proc I assume would just reuse the execution plan)

or

b. Write a stored procedure that gets the 25,000 records, use the forbidden cursor to go through each record and based on some T-SQL logic, update that row in this stored procedure.

or

UPDATED: MY SOLUTION IS THIS For what it's worth I am going with persisited computed columns, and breaking the loop into smaller update statements to update the column (all wrapped in a transaction). See article below. I think it will be really fast, compared to a loop..

http://technet.microsoft.com/en-us/library/cc917696.aspx


You obviously have some condition that determines wheter the value should be 1,2 or 3. You could just do 3 update queries. Each query would update the records based on the condition that determines if the value should be 1, 2 or 3. Don't pull all the data down to your machine if you can help it.


My first choice would be to do it all in SQL if I could, i.e. update xxx set col=1 where (your logic here), update xxx set col=2 where (logic) etc.

If you need to do the logic in the vb client, either in a web app or client server, my choice would be to use a datareader to pass thru the records (pulling down only the columns that are required, not the whole row) and the either execute either a TSQL update or stored procedure to call to update those records that need to be updated, one at a time).

the datareader will give you the best performance; the SP should perform at least as good if not better than a TSQL update, (but probably not by much).

EDIT: Avoid server-side cursors at (almost) any cost...they are true hogs.


Solving this without entering c# is actually the best option if performance is key. Run your queries outside c#. If it's really necessary use DataReaders.


I would not go with option B. In my experience using cursors is extremely slow.

C. Use a DataReader and update the records with an ExecuteNonQuery


How about option (C) A stored procedure that updates the table using set-based logic rather than a cursor:

...
update x set col = f(x)
from   x
...


Depending on how the updates work you have a couple options.

  1. Have a computed column where the results are persisted. That way when the record changes it will be updated in one place.

  2. Instead of running 25,000 update queries, just use sqlbulk load.

  3. (and this is my preference). Have your app send the parameters to SQL server on what to update. In this case I'd lean towards using a static cursor as it would be a bit faster, as long as one record doesn't necessarily affect the next one.


You can either:

  1. Go with the 3 separate UPDATEs suggested by @Andrew

  2. Pull the records into a Temporary Table and loop through them in batches of maybe 1000 records at a time in a WHILE loop for the UPDATE statement (so, 25 loops / UPDATEs)

  3. Or, if you are using SQL Server 2008 (or newer) and the algorithm to determine the change is complex, you can pull the 25,000 rows into a collection on the .Net side and stream the changes back into a Proc that has a Table-Valued Parameter and do a single update. You can find an example of this at:
    http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/

In each case, you want to avoid 25,000 UPDATE calls.


I have similar situation. Actually, i have > 10.000.000 records. Business logic was rather complex, and there was old code purely written in SQL. Managers told me that with old code, it take 15+ hours per 1.000.000 records. With my solution, i took only 5 mins, literally ! I have done this in loop which have 3 steps in iteration, and each iteration took one batch of records:

  1. Bulk load of invoices. I don't remember batch size, i think it was about few thousands.
  2. Performing business logic on loaded records
  3. Bulk insert. Because it was bulk, it couldn't be update. So it was bulk into temporary table, with almost same structure as original table, and then update by key in original table. Temporary table was emptied/deleted with every time of bulk insert. It is much faster than standard update.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜