mysql query select keys and insert
I have two tables: Articles
that stores information about Articles, and PageLinks
that stores hyperlinks between pages. The schema is as below.
CREATE TABLE `Articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`slug` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`label` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `slug_UNIQUE` (`slug`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `PageLinks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_id` int(11) NOT NULL,
`to_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index4` (`to_id`,`from_id`),
KEY `fk_PageLinks_1` (`from_id`),
KEY `fk_PageLinks_2` (`to_id`),
CONSTRAINT `fk_PageLinks_1` FOREIGN KEY (`from_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_PageLinks_2` FOREIGN KEY (`to_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
I have several million slug pairs that indicate hyperlinks between corresponding pages. I am trying to load the PageLinks table from these slug pairs.
Currently, i have a python program that issues select id
queries for each slug to convert the slug pairs to Article id pairs. The id pairs are then written to a file and loaded using load data infile
. Additionally if a slug does not exist in the Articles table, the program inserts a dummy row without label and then uses id of that row.
I am trying to optimize the program to load entries faster (I have around 18GB of slug pairs to load). I believe some speed up can be achieved if it is possible to do slug->id resolution and page link insert开发者_如何学运维ion together in bulk (thus avoiding the per SELECT overhead). What is the best possible way to do this in mysql?
Making a separate SELECT
for each slug is inefficient indeed.
You should load your slug pairs into a table:
CREATE TABLE pairs
(
slug1 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
slug2 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
);
, load it with your pairs, then issue the following statements:
INSERT IGNORE
INTO Articles (slug)
SELECT slug1
FROM pairs;
INSERT IGNORE
INTO Articles (slug)
SELECT slug2
FROM pairs;
INSERT
INTO pairs (from_id, to_id)
SELECT a1.id, a2.id
FROM pairs
JOIN articles a1
ON a1.slug = slug1
JOIN articles a2
ON a2.slug = slug2;
精彩评论