merging tables which consist of 17 million records
I have 3 tables in which 2 tables have 200 000 records and another table of 1 800 000 records. I do merge these 3 tables using 2 contraints that is OCN and TIMESTAMP(month,year)
. first two tables has columns for month and year as Monthx (which includes both month,date and year). and other table as seperate columns for ea开发者_开发问答ch month and year. I gave the query as,
mysql--> insert into trail
select * from A,B,C
where A.OCN=B.OCN
and B.OCN=C.OCN
and C.OCN=A.OCN
and date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b')
and date_format(A.Monthx,'%b')=C.IMonth
and date_format(B.Monthx,'%b')=C.month
and year(A.Monthx)=year(B.Monthx)
and year(B.Monthx)=C.Iyear
and year(A.Monthx)=C.Iyear
I gave this query 4days before its still running.could u tell me whether this query is correct or wrong and provide me a exact query..(i gave tat '%b'
because my C table has a column which has months in the form JAN,MAR).
Please don't use implicit where joins, bury it in 1989, where it belongs. Use explicit joins instead
select * from a inner join b on (a.ocn = b.ocn and
date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b') ....
This select part of the query (had to rewrite it because I refuse to deal with '89 syntax)
select * from A
inner join B on (
A.OCN=B.OCN
and date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b')
and year(A.Monthx)=year(B.Monthx)
)
inner join C on (
C.OCN=A.OCN
and date_format(A.Monthx,'%b')=C.IMonth
and date_format(B.Monthx,'%b')=C.month
and year(B.Monthx)=C.Iyear
and year(A.Monthx)=C.Iyear
)
Has a lot of problems.
- using a function on a field will kill any opportunity to use an index on that field.
- you are doing a lot of duplicate test. if
(A = B)
and(B = C)
then it logically follows that(A = C)
- the translations of the date fields take a lot of time
I would suggest you rewrite your tables to use fields that don't need translating (using functions), but can be compared directly.
A field like yearmonth : char(6)
e.g. 201006
can be indexed and compared much faster.
If the table A,B,C have a field called ym
for short than your query can be:
INSERT INTO TRAIL
SELECT a.*, b.*, c.* FROM a
INNER JOIN b ON (
a.ocn = b.ocn
AND a.ym = b.ym
)
INNER JOIN c ON (
a.ocn = c.ocn
AND a.ym = c.ym
);
If you put indexes on ocn
(primary index probably) and ym
the query should run about a million rows a second (or more).
To test if your query is ok, import a small subset of records from A, B and C to a temporary database and test it their.
You have redundancies in your implicit JOIN because you are joining A.OCN with B.OCN, B.OCN with C.OCN and then C.OCN to A.OCN, on of those can be deleted. If A.OCN = B.OCN and B.CON = C.OCN, A.OCN = C.OCN is implied. Further, I guess you have redundancies in your date comparisons.
精彩评论