开发者

Merging SQL Records

hey, i'm not sure if this is the right place to ask this, but I'm having troubles with a script of mine. i'm trying to do some database merging with PHP, a one-time merge. What I am doing is grabbing a list of all the unique titles of a book, then foreaching through those titles finding books with that title and grouping by types, then foreaching through that result finding books with that unique title and type, repeating the process with copyrights and publishers. the main issue is that this takes so long the mysql server times out. Is there a better way to do this, possibly with SQL?

An example of merge would be this:

INSERT INTO books (1, 'Some Book', 'Pengu开发者_如何转开发in Publishing', '2005', 1);
INSERT INTO books (2, 'Some Book', 'Penguin Publishing', '2005', 1);
INSERT INTO books (3, 'Some Book 2', 'Penguin Publishing', '2005', 1);
INSERT INTO books (4, 'Some Book 2', 'Lion Publishing', '2005', 1);
INSERT INTO books (5, 'Some Book 2', 'Penguin Publishing', '2005', 2);
INSERT INTO books (6, 'Some Book 2', 'Penguin Publishing', '2005', 2);
INSERT INTO books (7, 'Somebody', 'Lion Publishing', '2005', 1);
INSERT INTO books (8, 'Somebody', 'Lion Publishing', '2007', 1);
INSERT INTO books (9, 'Somebody', 'Penguin Publishing', '2005', 1);

Id's 1&2 should join because they have 1) the same title 2) the same material_type_id 3) the same copyright and 4) the same publisher.

Is there any way to achieve this with PURE SQL or with minor PHP?

Here is my data structure:

CREATE TABLE books (
  id int(11) NOT NULL AUTO_INCREMENT,
  title varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  publisher varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  copyright varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  material_type_id int(11) DEFAULT NULL
  PRIMARY KEY (id),
  FULLTEXT KEY title (title)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

EDIT I guess I forgot to mention something really important. I can't just use a GROUP query because I need to use the id's. I have another table called 'Items' that has the field, book_id. If i just GROUP the records, I'll end up with orphaned items. What I need is a way to break up an order query, So i end up with a result like this:

id | title | publisher | copyright | material_type_id
-----------------------------------------------------
1  'Some Book' 'Penguin Publishing' '2005' 1
2  'Some Book' 'Penguin Publishing' '2005' 1
-----------------------------------------------------
5  'Some Book 2' 'Penguin Publishing' '2005' 2
6  'Some Book 2' 'Penguin Publishing' '2005' 2
-----------------------------------------------------
3  'Some Book 2' 'Penguin Publishing' '2005' 1
-----------------------------------------------------

Cause then I can run a query like this on the first group:

UPDATE items SET book_id = 1 WHERE book_id IN (1, 2)

I hope that makes sense. It's really hard for me to explain. Thanks for your help and patience.


This query keeps all but one copy (with the min ID) of the book given the duplicate rules.

DELETE FROM A
using books A
join (select title,publisher,material_type_id,copyright, MIN(id) keep
    from books
    group by title,publisher,material_type_id,copyright
    having count(*) > 1) B
    on A.title=B.title
       and A.publisher=B.publisher
       and A.material_type_id=B.material_type_id
       and A.copyright=B.copyright
       and A.id <> B.keep;

BEFORE running the above, move the item records first (not formally tested)

UPDATE items
join books A on A.id = items.book_id
join (select title,publisher,material_type_id,copyright, MIN(id) keep
    from books
    group by title,publisher,material_type_id,copyright
    having count(*) > 1) B
    on A.title=B.title
       and A.publisher=B.publisher
       and A.material_type_id=B.material_type_id
       and A.copyright=B.copyright
       and A.id <> B.keep
set items.book_id = B.keep
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜