Merging SQL Records
hey, i'm not sure if this is the right place to ask this, but I'm having troubles with a script of mine. i'm trying to do some database merging with PHP, a one-time merge. What I am doing is grabbing a list of all the unique titles of a book, then foreaching through those titles finding books with that title and grouping by types, then foreaching through that result finding books with that unique title and type, repeating the process with copyrights and publishers. the main issue is that this takes so long the mysql server times out. Is there a better way to do this, possibly with SQL?
An example of merge would be this:
INSERT INTO books (1, 'Some Book', 'Pengu开发者_如何转开发in Publishing', '2005', 1);
INSERT INTO books (2, 'Some Book', 'Penguin Publishing', '2005', 1);
INSERT INTO books (3, 'Some Book 2', 'Penguin Publishing', '2005', 1);
INSERT INTO books (4, 'Some Book 2', 'Lion Publishing', '2005', 1);
INSERT INTO books (5, 'Some Book 2', 'Penguin Publishing', '2005', 2);
INSERT INTO books (6, 'Some Book 2', 'Penguin Publishing', '2005', 2);
INSERT INTO books (7, 'Somebody', 'Lion Publishing', '2005', 1);
INSERT INTO books (8, 'Somebody', 'Lion Publishing', '2007', 1);
INSERT INTO books (9, 'Somebody', 'Penguin Publishing', '2005', 1);
Id's 1&2 should join because they have 1) the same title 2) the same material_type_id 3) the same copyright and 4) the same publisher.
Is there any way to achieve this with PURE SQL or with minor PHP?
Here is my data structure:
CREATE TABLE books (
id int(11) NOT NULL AUTO_INCREMENT,
title varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
publisher varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
copyright varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
material_type_id int(11) DEFAULT NULL
PRIMARY KEY (id),
FULLTEXT KEY title (title)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
EDIT I guess I forgot to mention something really important. I can't just use a GROUP query because I need to use the id's. I have another table called 'Items' that has the field, book_id. If i just GROUP the records, I'll end up with orphaned items. What I need is a way to break up an order query, So i end up with a result like this:
id | title | publisher | copyright | material_type_id
-----------------------------------------------------
1 'Some Book' 'Penguin Publishing' '2005' 1
2 'Some Book' 'Penguin Publishing' '2005' 1
-----------------------------------------------------
5 'Some Book 2' 'Penguin Publishing' '2005' 2
6 'Some Book 2' 'Penguin Publishing' '2005' 2
-----------------------------------------------------
3 'Some Book 2' 'Penguin Publishing' '2005' 1
-----------------------------------------------------
Cause then I can run a query like this on the first group:
UPDATE items SET book_id = 1 WHERE book_id IN (1, 2)
I hope that makes sense. It's really hard for me to explain. Thanks for your help and patience.
This query keeps all but one copy (with the min ID) of the book given the duplicate rules.
DELETE FROM A
using books A
join (select title,publisher,material_type_id,copyright, MIN(id) keep
from books
group by title,publisher,material_type_id,copyright
having count(*) > 1) B
on A.title=B.title
and A.publisher=B.publisher
and A.material_type_id=B.material_type_id
and A.copyright=B.copyright
and A.id <> B.keep;
BEFORE running the above, move the item records first (not formally tested)
UPDATE items
join books A on A.id = items.book_id
join (select title,publisher,material_type_id,copyright, MIN(id) keep
from books
group by title,publisher,material_type_id,copyright
having count(*) > 1) B
on A.title=B.title
and A.publisher=B.publisher
and A.material_type_id=B.material_type_id
and A.copyright=B.copyright
and A.id <> B.keep
set items.book_id = B.keep
精彩评论