开发者

copy subset of rows from one table to another, filtering on two columns

I have the following MySql table containing my raw event data (about 1.5 million rows)

userId  | pathId  | other stuff....

I have an index on userId, pathId (approx 50,000 unique combinations)

During my processing, I identify 30,000 userId, pathId values that I don't want, but I do want to keep the original raw table. So I want to copy all rows into a processed event table, except the rows that match this 30,000 userId, pathId values.

An approach I'm considering is to write the 30,000 userId,PathId values of开发者_开发技巧 the rows I do not want into a temp_table, and then doing something like this:

[create table processed_table ...]
insert into processed_table 
   select * from raw_table r 
   where not exists (
       select * from temp_table t where r.userId=t.userid and r.pathId=t.pathId
   )

For info, processed_table generally ends up being half the size of raw_table.

Anyway, this seems to work but my SQL skills are limited, so my question (finally) is - is this the most efficient way to do this?


No, it's not the most efficient. Source

That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

Here's an example with NOT IN:

INSERT INTO processed_table 
SELECT *
FROM raw_table 
WHERE (userId, pathId) NOT IN (
    SELECT userId, pathId FROM temp_table
)

And LEFT JOIN ... IS NULL:

INSERT INTO processed_table 
SELECT *
FROM raw_table r
LEFT JOIN temp_table t
ON r.userId = t.userid AND r.pathId = t.pathId
WHERE t.userId IS NULL

However, since your table is very small and has only 50,000 rows, your original query is probably fast enough.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜