Matching two big SQLITE3 tables
I have two tables:
t1: f1, f2, f3, f4, rowid_t2, sts
t2: f1, f2, f3, f4, sts
with different amounts of records that exceeds 10 millions.
I need to match them using f1
, f2
and f3开发者_如何学Python
of each table as the keys, the relation is
I need to perform the next matching operations:
a) If t1.f1 = t2.f1
and t1.f2 = t2.f2
and t1.f3 = t2.f3
then I must update the rowid of t2
into t1.rowid_t2
and save t1.sts=1
, t2.sts=1
in the records matched.
b) If t1.f1 = t2.f1
and t1.f2 = t2.f2
and t1.f3 <> t2.f3
then I must update the rowid of t2
into rowid_t2
and save t1.sts=2
, t2.sts=2
in the records matched.
c) If t1.f1 = t2.f1
and t1.f2 <> t2.f2
and t1.f3 <> t2.f3
then I must update the rowid of T2
into rowid_t2 and save t1.sts=3
, t2.sts=3
in the records matched.
I have 2 questions:
Can I solve the problem using
UPDATE
? If yes, it would be nice if you can show me the solution only for a)How many indexes should I create to optimize the necessary UPDATEs and SELECTs for the three operations?
Many thanks!!
I have no experience with SQLite
, but I gave your problem a try.
As you already said, it is not possible to updated more than one table at once (the solution provided by Tristran only works for MySQL
as far as I know).
First I update t1
and set sts=1
for every row (if only a few rows are affected, it might be more efficient to add a WHERE
-clause to get the relevant rows), and rowid_t2
to t2.rowid
where all relevant columns match.
UPDATE t1
SET
sts = 1,
rowid_t2 = (
SELECT rowid FROM t2
WHERE t2.f1 = t1.f1 AND t2.f2 = t1.f2 AND t2.f3 = t1.f3
);
Then I do the same with sts=2
and sts=3
but only where rowid_t2
has not already been set.
UPDATE t1
SET
sts = 2,
rowid_t2 = (
SELECT rowid FROM t2
WHERE t2.f1 = t1.f1 AND t2.f2 = t1.f2
)
WHERE t1.rowid_t2 IS NULL;
UPDATE t1
SET
sts = 3,
rowid_t2 = (
SELECT rowid FROM t2
WHERE t2.f1 = t1.f1
)
WHERE t1.rowid_t2 IS NULL;
Then I reset t1.sts
that have been set to 3
but are not actually valid:
UPDATE t1
SET sts = NULL
WHERE rowid_t2 IS NULL;
And finally I update sts
in t2
to the "lowest" method that matched in t1
. So if a row in t2
has one row in t1
that matches for all criterias, and one that matches only for f1
, I still set sts=1
.
UPDATE t2
SET sts = (
SELECT MIN(sts)
FROM t1
WHERE t1.rowid_t2 = t2.rowid
)
I have not tried with indexes, but I think you should have one for t2.f1
, t2.f2
and t2.f3
for the first three updates (might need separate indexes with SQLite
, not sure), and another one at t1.rowid_t2
for the other two updates and for t1.rowid_t2 IS NULL
.
Try with some representative test data first, to see if the result is as expected and performance is ok.
Good luck :)
Won't this work for a)?
update t1,t2
set t1.rowidt2 = t2.rowid, t1.sts= 1, t2.sts = 1
where t1.f1 = t2.f1 and t1.f2 = t2.f2 and t1.f3 = t2.f3
and similar with b) and c)?
精彩评论