updlock vs for update cursor
I need to update a column of all rows of a table and I need to use UPDLOCK
to do it.
For example:
UPDATE table (UPDLock)
SET column_name = ‘123’
Another alternative is to use an for update cursor and update each row. The advantage with the second approach is that the lock is not held till the end of the transaction and concurrent updates of the same rows can happen sooner. At the same time update cursors are said to have bad performance. Which is a better approach?
EDIT:
Assume the column is updated with a value th开发者_Python百科at is derived from another column in the table. In other words, column_name = f(column_name_1)
You cannot give an UPDLOCK hint to a write operation, like UPDATE statement. It will be ignored, since all writes (INSERT/UPDATE/DELETE) take the same lock, an exclusive lock on the row being updated. You can quickly validate this yourself:
create table heap (a int);
go
insert into heap (a) values (1)
go
begin transaction
update heap
--with (UPDLOCK)
set a=2
select * from sys.dm_tran_locks
rollback
If you remove the comment --
on the with (UPDLOCK)
you'll see that you get excatly the same locks (an X lock on the physical row). You can do the same experiment with a B-Tree instead of a heap:
create table btree (a int not null identity(1,1) primary key, b int)
go
insert into btree (b) values (1)
go
begin transaction
update btree
--with (UPDLOCK)
set b=2
select * from sys.dm_tran_locks
rollback
Again, the locks acquired will be identical with or w/o the hint (an exclusive lock on the row key).
Now back to your question, can this whole table update be done in batches? (since this is basically what you're asking). Yes, if the table has a primary key (to be precise what's required is an unique index to batch on, preferable the clustered index to avoid tipping point issues). Here is an example how:
create table btree (id int not null identity(1,1) primary key, b int, c int);
go
set nocount on;
insert into btree (b) values (rand()*1000);
go 1000
declare @id int = null, @rc int;
declare @inserted table (id int);
begin transaction;
-- first batch has no WHERE clause
with cte as (
select top(10) id, b, c
from btree
order by id)
update cte
set c = b+1
output INSERTED.id into @inserted (id);
set @rc = @@rowcount;
commit;
select @id = max(id) from @inserted;
delete from @inserted;
raiserror (N'Updated %d rows, up to id %d', 0,0,@rc, @id);
begin transaction;
while (1=1)
begin
-- update the next batch of 10 rows, now it has where clause
with cte as (
select top(10) id, b, c
from btree
where id > @id
order by id)
update cte
set c = b+1
output INSERTED.id into @inserted (id);
set @rc = @@rowcount;
if (0 = @rc)
break;
commit;
begin transaction;
select @id = max(id) from @inserted;
delete from @inserted;
raiserror (N'Updated %d rows, up to id %d', 0,0,@rc, @id);
end
commit
go
If your table doesn't have a unique clustered index then it becomes really tricky to do this, you would need to do the same thing a cursor has to do. While from a logical point of view the index is not required, not having it would cause each batch to do a whole-table-scan, which would be pretty much disastrous.
In case you wonder what happens if someone inserts a value behind the current @id, then the answer is very simple: the exactly same thing that would happen if someone inserts a value after the whole processing is complete.
Personally I think the single UPDATE will be much better. There are very few cases where a cursor will be better overall, regardless of concurrent activity. In fact the only one that comes to mind is a very complex running totals query - I don't think I've ever seen better overall performance from a cursor that is not read only, only SELECT queries. Of course, you have much better means of testing which is "a better approach" - you have your hardware, your schema, your data, and your usage patterns right in front of you. All you have to do is perform some tests.
That all said, what is the point in the first place of updating that column so that every single row has the same value? I suspect that if the value in that column has no bearing to the rest of the row, it can be stored elsewhere - perhaps a related table or a single-row table. Maybe the value in that column should be NULL (in which case you get it from the other table) unless it is overriden for a specific row. It seems to me like there is a better solution here than touching every single row in the table every time.
精彩评论