Populating a table with a large amount of data from a temporary table - MySQL
Ok I have a temporary MySQL table with 135,000 rows, from this temporary table I want to populate several other tables.
First of all this is the structure of the temporary table
CREATE TEMPORARY TABLE TVTEMPTABLE (
PROGTITLE TEXT, SUBTITLE TEXT,
EPISODE TEXT, YR YEAR, DIRECTOR TEXT,
PERFORMERS TEXT, PREMIERE BOOL, FILM BOOL, RPEAT BOOL,
SUBTITLES BOOL, WIDESCREEN BOOL, NEWSERIES BOOL, DEAFSIGNED BOOL,
BNW BOOL, STARRATING TINYINT, CERTIFICATE VARCHAR(5), GENRE VARCHAR(50),
DESCRIPTION TEXT, CHOICE BOOL, PROGDATE DATE, STARTIME TIME, ENDTIME TIME,
DURATION INT, CHANNELID INT NOT NULL)
And this is the structure of one of the tables I plan to populate from this.
CREATE TABLE PROGRAMME (
PROGRAMMEID INT NOT NULL AUTO_INCREMENT, GENREID INT NOT NULL, PROGTITLE VARCHAR(50), YR YEAR, DIRECTOR VARCHAR(50), PERFORMERS TEXT, FILM BOOL, WIDESCREEN BOOL, BNW BOOL, CERTIFICATE VARCHAR(5), DESCRIPTION TEXT, PRIMARY KEY(PROGRAMMEID), INDEX (GENREID), FOREIGN KEY (GENREID) REFERENCES GENRE(GENREID)
) ENGINE=INNODB;
And this is how I do my insert to the programme table
INSE开发者_JS百科RT INTO PROGRAMME (
GENREID, PROGTITLE, YR, DIRECTOR, PERFORMERS, FILM, WIDESCREEN, BNW, CERTIFICATE, DESCRIPTION) SELECT G.GENREID, T.PROGTITLE, T.YR, T.DIRECTOR, T.PERFORMERS, T.FILM, T.WIDESCREEN, T.BNW, T.CERTIFICATE, T.DESCRIPTION FROM TVTEMPTABLE T,GENRE G WHERE G.GENRENAME = T.GENRE AND NOT EXISTS (
SELECT * FROM PROGRAMME P WHERE P.PROGTITLE = T.PROGTITLE )
This is however taking a very very long time to do, how should I approach this?
Thanks, Paul
Ok thanks guys still having a few issues with this I'm trying the left join example, I've found however that if the table I'm inserting into is empty to begin with though, then it inserts duplicates. Here's a simple example
CREATE TEMPORARY TABLE TEMP(
GENRENAME TEXT);
CREATE TABLE GENRE(
GENREID INT NOT NULL AUTO_INCREMENT,
GENRENAME TEXT, PRIMARY KEY(GENREID)
) ENGINE=INNODB;
INSERT INTO TEMP(
GENRENAME)
VALUES("news");
INSERT INTO TEMP(
GENRENAME)
VALUES("news");
This inserts the "news" genre into the temporary table twice. Now if I run this SQL command
INSERT INTO GENRE(
GENRENAME)
SELECT
T.GENRENAME
FROM
TEMP T
LEFT JOIN
GENRE G ON G.GENRENAME=T.GENRENAME
WHERE
G.GENRENAME IS NULL;
It inserts "news" twice into the genre table which is wrong. If I run the same command again, it correctly doesn't insert any new rows.
How about this:
INSERT INTO PROGRAMME ( GENREID, PROGTITLE, YR, DIRECTOR, PERFORMERS, FILM, WIDESCREEN, BNW, CERTIFICATE, DESCRIPTION)
SELECT G.GENREID, T.PROGTITLE, T.YR, T.DIRECTOR, T.PERFORMERS, T.FILM, T.WIDESCREEN, T.BNW, T.CERTIFICATE, T.DESCRIPTION
FROM TVTEMPTABLE T,GENRE G
WHERE G.GENRENAME = T.GENRE
AND T.PROGTITLE NOT IN ( SELECT DISTINCT P.PROGTITLE FROM PROGRAMME P)
I believe your NOT EXISTS
has to execute for each selected row. Replace it with a single static subquery and check NOT IN
on it.
You are doing a (possibly large) sub-select for each row.
I would suggest doing a LEFT JOIN against PROGRAMME, and then only inserting rows where the join result is NULL, as follows:
INSERT INTO PROGRAMME (
GENREID, PROGTITLE, YR, DIRECTOR,
PERFORMERS, FILM, WIDESCREEN, BNW,
CERTIFICATE, DESCRIPTION)
SELECT
G.GENREID, T.PROGTITLE, T.YR, T.DIRECTOR,
T.PERFORMERS, T.FILM, T.WIDESCREEN, T.BNW,
T.CERTIFICATE, T.DESCRIPTION
FROM
TVTEMPTABLE T
INNER JOIN GENRE G ON G.GENRENAME=T.GENRE
LEFT JOIN PROGRAMME P ON P.PROGTITLE=T.PROGTITLE
WHERE
P.PROGTITLE IS NULL
As an aside: have you considered (a) pretty-printing your code to be more readable and (b) not using all-caps field names?
i would first look at the performance of the embedded select statement in your insert, especially the NOT EXISTS clause. Make sure you have good indexes there.
another thought is to break it into smaller chunks so you do not have rollback space problems. So see if you can insert only 1000 or 10,000 rows at a time, then commit, then run again. etc.
wrap your insert statement in a transaction.
start transaction;
insert into programme (...) select ... from tvtemptable ...
commit;
if it's still slow then post the explain plan of the select portion of your insert statement so we can see what's going on :P
This was a combination of using left/inner join and the distinct keyword.
Create an index for P.PROGTITLE
and T.PROGTITLE
.
精彩评论