开发者

Performance of updating one big table using values from one small table

First, I know that the sql statement to update table_a using values from table_b is in the form of:

Oracle:

UPDATE table_a 
  SET (col1, col2) = (SELECT cola, colb 
                        FROM table_b 
                       WHERE table_a.key = table_b.key) 
WHERE EXISTS (SELECT * 
                FROM table_b 
               WHERE table_a.key = table_b.key)

MySQL:

UPDATE table_a 
INNER JOIN table_b ON table_a.key = table_b.key 
SET table_a.col1 = table_b.cola, 
    table_a.col2 = table_b.colb

What I understand is the database engine will go through records in table_a and update them with values from matching records in table_b.

So, if I have 10 millions records in table_a and only 10 records in table_b:

  1. Does that mean that the engine will do 10 millions iterations through table_a just to update 10 records? Are Oracle/MySQL/etc smart enough to do only 10 iterations through table_b?

  2. Is there a way to force the engine to actually iterate through records in table_b instead of table_a to do the update? Is there an alternative syntax for the sq开发者_如何学JAVAl statement?

Assume that table_a.key and table_b.key are indexed.


Either engine should be smart enough to optimize the query based on the fact that there are only ten rows in table b. How the engine determines what to do is based factors like indexes and statistics.

If the "key" column is the primary key and/or is indexed, the engine will have to do very little work to run this query. It will basically already sort of "know" where the matching rows are, and look them up very quickly. It won't have to "iterate" at all.

If there is no index on the key column, the engine will have to to a "table scan" (roughly the equivalent of "iterate") to find the right values and match them up. This means it will have to scan through 10 million rows.

Do a little reading on what's called an Execution Plan. This is basically an explanation of what work the engine had to do in order to run your query (some databases show it in text only, some have the option of seeing it graphically). Learning how to interpret an Execution Plan will give you great insight into adding indexes to your tables and optimizing your queries.

Look these up if they don't work (it's been a while), but it's something like:

  • In MySQL, put the work "EXPLAIN" in front of your SELECT statement
  • In Oracle, run "SET AUTOTRACE ON" before you run your SELECT statement

I think the first (Oracle) query would be better written with a JOIN instead of a WHERE EXISTS. The engine may be smart enough to optimize it properly either way. Once you get the hang of interpreting an execution plan, you can run it both ways and see for yourself. :)


Okay I know answering own question is usually frowned upon but I already accepted another answer and won't unaccept it so meh here it is ..

I've discovered a much better alternative that I'd like to share it with anyone who encounters the same scenario: MERGE statement.

Apparently, newer Oracle versions introduced this MERGE statement which simply blows! Not only that the performance is so much better in most cases, the syntax is so simple and so make sense that I feel stupid for using the UPDATE statement! Here comes ..

MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
  table_a.col1 = table_b.cola,
  table_a.col2 = table_b.colb;

And what more is that I can also extend the statement to include INSERT action when table_a does not have matching records for some records in table_b:

MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
  table_a.col1 = table_b.cola,
  table_a.col2 = table_b.colb
WHEN NOT MATCHED THEN INSERT
  (key, col1, col2)
  VALUES (table_b.key, table_b.cola, table_b.colb);

This new statement type made my day the day I discovered it :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜