开发者

SQL optimization question (oracle)

Edit: Please answer one of the two answers I ask. I know there are other options that would be better in a different case. These other potential options (partitioning the table, running as one large delete statem开发者_运维百科ent w/o committing in batches, etc) are NOT options in my case due to things outside my control.

I have several very large tables to delete from. All have the same foreign key that is indexed. I need to delete certain records from all tables.

table source
  id --primary_key
  import_source --used for choosing the ids to delete

table t1
  id --foreign key
  --other fields

table t2
  id --foreign key
  --different other fields

Usually when doing a delete like this, I'll put together a loop to step through all the ids:

declare
my_counter integer := 0;
begin
for cur in (
select id from source where import_source = 'bad.txt'
) loop
  begin
    delete from source where id = cur.id;
    delete from t1 where id = cur.id;
    delete from t2 where id = cur.id;
    my_counter := my_counter + 1;
    if my_counter > 500 then
      my_counter := 0;
      commit;
    end if;
    end;
  end loop;
  commit;
end;

However, in some code I saw elsewhere, it was put together in separate loops, one for each delete.

declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'

for h in 1..my_import_ids.count
  delete from t1 where id = my_import_ids(h);
    --do commit check
end loop;
for h in 1..my_import_ids.count
  delete from t2 where id = my_import_ids(h);
    --do commit check
end loop;

--do commit check will be replaced with the same chunk to commit every 500 rows as the above query

So I need one of the following answered:

1) Which of these is better?

2) How can I find out which is better for my particular case? (IE if it depends on how many tables I have, how big they are, etc)

Edit:

I must do this in a loop due to the size of these tables. I will be deleting thousands of records from tables with hundreds of millions of records. This is happening on a system that can't afford to have the tables locked for that long.

EDIT:

NOTE: I am required to commit in batches. The amount of data is too large to do it in one batch. The rollback tables will crash our database.

If there is a way to commit in batches other than looping, I'd be willing to hear it. Otherwise, don't bother saying that I shouldn't use a loop...


Why loop at all?

delete from t1 where id IN (select id from source where import_source = 'bad.txt';
delete from t2 where id IN (select id from source where import_source = 'bad.txt';
delete from source where import_source = 'bad.txt'

That's using standard SQL. I don't know Oracle specifically, but many DBMSes also feature multi-table JOIN-based DELETEs as well that would let you do the whole thing in a single statement.


David, If you insist on commiting, you can use the following code:

declare
  type import_ids is table of integer index by pls_integer;
  my_import_ids import_ids;
  cursor c is select id from source where import_source = 'bad.txt';
begin
  open c;
  loop
    fetch c bulk collect into my_import_ids limit 500;
    forall h in 1..my_import_ids.count
      delete from t1 where id = my_import_ids(h);
    forall h in 1..my_import_ids.count
      delete from t2 where id = my_import_ids(h);
    commit;
    exit when c%notfound;
  end loop;
  close c;
end;

This program fetches ids by pieces of 500 rows, deleting and commiting each piece. It should be much faster then row-by-row processing, because bulk collect and forall works as a single operation (in a single round-trip to and from database), thus minimizing the number of context switches. See Bulk Binds, Forall, Bulk Collect for details.


First of all, you shouldn't commit in the loop - it is not efficient (generates lots of redo) and if some error occurrs, you can't rollback.

As mentioned in previous answers, you should issue single deletes, or, if you are deleting most of the records, then it could be more optimal to create new tables with remaining rows, drop old ones and rename the new ones to old names.

Something like this:

CREATE TABLE new_table as select * from old_table where <filter only remaining rows>;

index new_table
grant on new table
add constraints on new_table
etc on new_table

drop table old_table
rename new_table to old_table;

See also Ask Tom


Larry Lustig is right that you don't need a loop. Nonetheless there may be some benefit in doing the delete in smaller chunks. Here PL/SQL bulk binds can improve speed greatly:

declare
type import_ids is table of integer index by pls_integer;
my_count integer := 0;
begin
select id bulk collect into my_import_ids from source where import_source = 'bad.txt'

forall h in 1..my_import_ids.count
  delete from t1 where id = my_import_ids(h);
forall h in 1..my_import_ids.count
  delete from t2 where id = my_import_ids(h);

The way I wrote it it does it all at once, in which case yeah the single SQL is better. But you can change your loop conditions to break it into chunks. The key points are

  • don't commit on every row. If anything, commit only every N rows.
  • When using chunks of N, don't run the delete in an ordinary loop. Use forall to run the delete as a bulk bind, which is much faster.

The reason, aside from the overhead of commits, is that each time you execute an SQL statement inside PL/SQL code it essentially does a context switch. Bulk binds avoid that.


You may try partitioning anyway to use parallel execution, not just to drop one partition. The Oracle documentation may prove useful in setting this up. Each partition would use it's own rollback segment in this case.


If you are doing the delete from the source before the t1/t2 deletes, that suggests you don't have referential integrity constraints (as otherwise you'd get errors saying child records exist).

I'd go for creating the constraint with ON DELETE CASCADE. Then a simple

DECLARE
  v_cnt NUMBER := 1;
BEGIN
  WHILE v_cnt > 0 LOOP
   DELETE FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
   v_cnt := SQL%ROWCOUNT;
   COMMIT;
  END LOOP;
END;

The child records would get deleted automatically.

If you can't have the ON DELETE CASCADE, I'd go with a GLOBAL TEMPORARY TABLE with ON COMMIT DELETE ROWS

DECLARE
  v_cnt NUMBER := 1;
BEGIN
  WHILE v_cnt > 0 LOOP
   INSERT INTO temp (id)
   SELECT id FROM source WHERE import_source = 'bad.txt' and rownum < 5000;
   v_cnt := SQL%ROWCOUNT;
   DELETE FROM t1 WHERE id IN (SELECT id FROM temp);
   DELETE FROM t2 WHERE id IN (SELECT id FROM temp);
   DELETE FROM source WHERE id IN (SELECT id FROM temp);
   COMMIT;
  END LOOP;
END;

I'd also go for the largest chunk your DBA will allow. I'd expect each transaction to last for at least a minute. More frequent commits would be a waste.

This is happening on a system that can't afford to have the tables locked for that long.

Oracle doesn't lock tables, only rows. I'm assuming no-one will be locking the rows you are deleting (or at least not for long). So locking is not an issue.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜