开发者

MySQL::Eliminating redundant elements from a table?

I have a table like this:

+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| v1    | int(11) | YES  | MUL | NULL    |       开发者_如何学Go| 
| v2    | int(11) | YES  | MUL | NULL    |       | 
+-------+---------+------+-----+---------+-------+

There is a tremendous amount of duplication in this table. For instance, elements like the following:

+------+------+
| v1   | v2   |
+------+------+
|    1 |    2 | 
|    1 |    3 | 
|    1 |    4 | 
|    1 |    5 | 
|    1 |    6 | 
|    1 |    7 | 
|    1 |    8 | 
|    1 |    9 | 
|    2 |    1 | 
|    4 |    1 | 
|    5 |    1 | 
|    6 |    1 | 
|    7 |    1 | 
|    8 |    1 | 
|    9 |    1 | 
+------+------+

The table is large with 1540000 entries. To remove the redundant entries (i.e. to get a table having only (1,9) and no (9,1) entries), I was thinking of doing it with a subquery but is there a better way of doing this?


Actually, @Mark's approach will work too. I just figured out another way of doing it and was wondering if I can some feedback on this as well. I tested it and it seems to work fast.

SELECT v1,v2 FROM table WHERE v1<v2 UNION SELECT v2,v1 FROM table WHERE v1>v2;

In the case where this is right, you can always create a new table:

CREATE TABLE newtable AS SELECT v1,v2 FROM edges WHERE v1<v2 UNION SELECT v2,v1 FROM edges WHERE v1>v2;


Warning: these commands modify your database. Make sure you have a backup copy so that you can restore the data again if necessary.

You can add the requirement that v1 must be less than v2 which will cut your storage requirement roughly in half. You can make sure all the rows in the database satisfy this condition and reorder those that don't and delete one of the rows when you have both.

This query will insert any missing rows where you have for example (5, 1) but not (1, 5):

INSERT INTO table1
SELECT T1.v2, T1.v1
FROM table1 T1
LEFT JOIN table1 T2
ON T1.v1 = T2.v2 AND T1.v2 = T2.v1
WHERE T1.v1 > T1.v2 AND T2.v1 IS NULL

Then this query deletes the rows you don't want, like (5, 1):

DELETE table1 WHERE v1 > v2

You might need to change other places in your code that were programmed before this constraint was added.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜