Is there a shortcut to normalizing a table where the columns=rows?
Suppose you had the mySQL table describing if you can mix two substances
Product A B C
---------------------
A y n y
B n y y
C y y y
The first step would be to transform it like
P1 P2 ?
-----------
A A y
A B n
A C y
B A y
B B y
B C n
C A y
C B n
C C y
But then you have duplicate information. (eg. If A can mix with B, then B can mix with A), so, you c开发者_运维知识库an remove several rows to get
P1 P2 ?
-----------
A A y
A B n
A C y
B B y
B C n
C C y
While the last step was pretty easy with a small table, doing it manually would take forever on a larger table. How would one go about automating the removal of rows with duplicate MEANING, but not identical content?
Thanks, I hope my question makes sense as I am still learning databases
If it's safe to assume that you're starting with all relationships doubled up, e.g.
If A B
is in the table, then B A
is guaranteed to be in the table.
Then all you have to do is remove all rows where P2 < P1;
DELETE FROM `table_name` WHERE `P2` < `P1`;
If this isn't the case, you can make it the case by going through the table and inserting all the duplicate rows if they don't already exist, then running this.
I don't think it's necessary in your situation, but as an intellectual exercise, you could build on Jamie Wong's solution and prevent non-duplicated columns from being removed with an EXISTS clause. Something like this:
DELETE FROM `table_name` AS t1
WHERE `P2` < `P1`
AND EXISTS (SELECT NULL FROM `table_name` AS t2
WHERE t1.`P1` = t2.`P2` AND t1.`P2` = t2.`P1`);
It pretty much just makes sure that there's a duplicate before deleting anything.
(My MySQL syntax might be a little off; it's been a while.)
Step 1 (as you've already done): Transform to Table2
P1 P2 ?
-----------
A A y
A B n
A C y
B A y
B B y
B C n
C A y
C B n
C C y
Step 2: ReOrder Columns, Select Distinct
SELECT DISTINCT
IF P1<P2 THEN P1 ELSE P2 END as P1, -- this puts the smallest value in P1
IF P1>P2 THEN P1 ELSE P2 END as P2 -- this puts the largest value in P2
FROM Table2
WHERE NOT P1=P2 --(Assuming records like A, A, y are not interesting)
I'm not a mySQL guy, so you might need to check the if/then syntax, but this seems conceptually ok anyway.
精彩评论