Deleting many rows without locking them
In PostgreSQL I have a query like the following which will delete 250k rows from a 1m row table:
DELETE FROM table WHERE key = 'needle';
The query takes over an hour to execute and during that time, the affected rows are locked for writing. That is not good because it means that a lot of update queries have to wait for the big delete query to complete (and then they will fail because the rows disappeared from under them but that is ok). I need a way to segment this big query into multiple parts so that they will cause the least interference with the update queries as possible. For example, if the delete query could be split up into chunks each with 1000 rows in them then the other update queries would at most have to wait for a delete query involving 1000 rows.
DELETE FROM table WHERE key = 'need开发者_开发问答le' LIMIT 10000;
That query would work nicely, but alas it does not exist in postgres.
Try a subselect and use a unique condition:
DELETE FROM
table
WHERE
id IN (SELECT id FROM table WHERE key = 'needle' LIMIT 10000);
Frak's answer is good, but this can be faster, but requires 8.4 because of window functions support (pseudocode):
result = query('select
id from (
select id, row_number(*) over (order by id) as row_number
from mytable where key=?
) as _
where row_number%8192=0 order by id, 'needle');
// result contains ids of every 8192nd row which key='needle'
last_id = 0;
result.append(MAX_INT); // guard
for (row in result) {
query('delete from mytable
where id<=? and id>? and key=?, row.id, last_id, 'needle');
// last_id is used to hint query planner,
// that there will be no rows with smaller id
// so it is less likely to use full table scan
last_id = row.id;
}
This is premature optimization — evil thing. Beware.
set the lock level for your delete and updates to a more granular lock mode. note that your transactions will be now be slower.
http://www.postgresql.org/docs/current/static/sql-lock.html
http://www.postgresql.org/docs/current/static/explicit-locking.html
精彩评论