开发者

Matching two big SQLITE3 tables

I have two tables:

t1: f1, f2, f3, f4, rowid_t2, sts

t2: f1, f2, f3, f4, sts

with different amounts of records that exceeds 10 millions.

I need to match them using f1, f2 and f3开发者_如何学Python of each table as the keys, the relation is

that one record of t1 can match with one record in t2 or many records of t1 can match with one record in t2, the matching depends on the conditions indicated in the three operations below.


I need to perform the next matching operations:

a) If t1.f1 = t2.f1 and t1.f2 = t2.f2 and t1.f3 = t2.f3 then I must update the rowid of t2 into t1.rowid_t2 and save t1.sts=1, t2.sts=1 in the records matched.

b) If t1.f1 = t2.f1 and t1.f2 = t2.f2 and t1.f3 <> t2.f3 then I must update the rowid of t2 into rowid_t2 and save t1.sts=2, t2.sts=2 in the records matched.

c) If t1.f1 = t2.f1 and t1.f2 <> t2.f2 and t1.f3 <> t2.f3 then I must update the rowid of T2 into rowid_t2 and save t1.sts=3, t2.sts=3 in the records matched.


I have 2 questions:

  1. Can I solve the problem using UPDATE? If yes, it would be nice if you can show me the solution only for a)

  2. How many indexes should I create to optimize the necessary UPDATEs and SELECTs for the three operations?

Many thanks!!


I have no experience with SQLite, but I gave your problem a try.

As you already said, it is not possible to updated more than one table at once (the solution provided by Tristran only works for MySQL as far as I know).

First I update t1 and set sts=1 for every row (if only a few rows are affected, it might be more efficient to add a WHERE-clause to get the relevant rows), and rowid_t2 to t2.rowid where all relevant columns match.

UPDATE t1
SET
  sts = 1, 
  rowid_t2 = (
    SELECT rowid FROM t2
    WHERE t2.f1 = t1.f1 AND t2.f2 = t1.f2 AND t2.f3 = t1.f3
  );

Then I do the same with sts=2 and sts=3 but only where rowid_t2 has not already been set.

UPDATE t1
SET
  sts = 2, 
  rowid_t2 = (
    SELECT rowid FROM t2
    WHERE t2.f1 = t1.f1 AND t2.f2 = t1.f2
  )
WHERE t1.rowid_t2 IS NULL;

UPDATE t1
SET
  sts = 3, 
  rowid_t2 = (
    SELECT rowid FROM t2
    WHERE t2.f1 = t1.f1
  )
WHERE t1.rowid_t2 IS NULL;

Then I reset t1.sts that have been set to 3 but are not actually valid:

UPDATE t1
SET sts = NULL
WHERE rowid_t2 IS NULL;

And finally I update sts in t2 to the "lowest" method that matched in t1. So if a row in t2 has one row in t1 that matches for all criterias, and one that matches only for f1, I still set sts=1.

UPDATE t2
SET sts = (
  SELECT MIN(sts)
  FROM t1
  WHERE t1.rowid_t2 = t2.rowid
)

I have not tried with indexes, but I think you should have one for t2.f1, t2.f2 and t2.f3 for the first three updates (might need separate indexes with SQLite, not sure), and another one at t1.rowid_t2 for the other two updates and for t1.rowid_t2 IS NULL.

Try with some representative test data first, to see if the result is as expected and performance is ok.

Good luck :)


Won't this work for a)?

update t1,t2
  set t1.rowidt2 = t2.rowid, t1.sts= 1, t2.sts = 1
where t1.f1 = t2.f1 and t1.f2 = t2.f2 and t1.f3 = t2.f3 

and similar with b) and c)?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜