Faster way to update 250k rows with SQL

2022-12-27 17:53 问答作者：

I need to update about 250k rows on a table and each field to update will have a different value depending on the row itself (not calculated based on the row id or the key but externally).

I tried with a parametrized query but it turns out to be slow (I still can try with a table-value parameter, SqlDbType.Structured, in SQL Server 2008, but I'd like to have a general way to do it on several databases including MySql开发者_C百科, Oracle and Firebird).

Making a huge concat of individual updates is also slow (BUT about 2 times faster than making thousands of individual calls (roundtrips!) using parametrized queries)

What about creating a temp table and running an update joining my table and the tmp one? Will it work faster?

How slow is "slow"?

The main problem with this is that it would create an enormous entry in the database's log file (in case there's a power failure half-way through the update, the database needs to log each action so that it can rollback in the event of failure). This is most likely where the "slowness" is coming from, more than anything else (though obviously with such a large number of rows, there are other ways to make the thing inefficient [e.g. doing one DB roundtrip per update would be unbearably slow], I'm just saying once you eliminate the obvious things, you'll still find it's pretty slow).

There's a few ways you can do it more efficiently. One would be to do the update in chunks, 1,000 rows at a time, say. That way, the database writes lots of small log entries, rather than one really huge one.

Another way would be to turn off - or turn "down" - the database's logging for the duration of the update. In SQL Server, for example, you can set the Recovery Model to "simple" or "bulk update" which would speed it up considerably (with the caveat that you are more at risk if there's a power failure or something during the update).

Edit Just to expand a little more, probably the most efficient way to actually execute the queries in the first place would be to do a BULK INSERT of all the new rows into a temporary table, and then do a single UPDATE of the existing table from that (or to do the UPDATE in chunks of 1,000 as I said above). Most of my answer was addressing the problem once you've implemented it like that: you'll still find it's pretty slow...

call a stored procedure if possible

If the columns updated are part of indexes you could

drop these indexes
do the update
re-create the indexes.

If you need these indexes to retrieve the data, well, it doesn't help.

You should use the SqlBulkCopy with the KeepIdentities flag set.

As part of a SqlTransaction do a query to SELECT all the records that need updating and then DELETE THEM, returning those selected (and now removed) records. Read them into C# in a single batch. Update the records on the C# side in memory, now that you've narrowed the selection and then SqlBulkCopy those updated records back, keys and all. And don't forget to commit the transaction. It's more work, but it's very fast.

Here's what I would do:

Retrieve the entire table, that is, the columns you need in order to calculate/retrieve/find/produce the changes externally
Calculate/produce those changes
Run a bulk insert to a temporary table, uploading the information you need server-side in order to do the changes. This would require the key information + new values for all the rows you intend to change.
Run SQL on the server to copy new values from the temporary table into the production table.

Pros:

Running the final step server-side is faster than running tons and tons of individual SQL, so you're going to lock the table in question for a shorter time
Bulk insert like this is fast

Cons:

Requires extra space in your database for the temporary table
Produces more log data, logging both the bulk insert and the changes to the production table

Here are things that can make your updates slow:

executing updates one by one through parametrized query
- solution: do update in one statement
large transaction creates big log entry
- see codeka's answer
updating indexes (RDBMS will update index after each row. If you change indexed column, it could be very costly on large table)
- if you can, drop indices before update and recreate them after
updating field that has foreign key constraint - for each inserted record RDBMS will go and look for appropriate key
- if you can, disable foreign key constraints before update and enable them after update
triggers and row level checks
- if you can, disable triggers before update and enable them after

继续阅读：ado.net sql

Faster way to update 250k rows with SQL

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？