开发者

Is there a shortcut to normalizing a table where the columns=rows?

Suppose you had the mySQL table describing if you can mix two substances

Product   A    B    C
---------------------
A         y    n    y
B         n    y    y
C         y    y    y

The first step would be to transform it like

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    A    y
B    B    y
B    C    n
C    A    y
C    B    n
C    C    y

But then you have duplicate information. (eg. If A can mix with B, then B can mix with A), so, you c开发者_运维知识库an remove several rows to get

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    B    y
B    C    n
C    C    y

While the last step was pretty easy with a small table, doing it manually would take forever on a larger table. How would one go about automating the removal of rows with duplicate MEANING, but not identical content?

Thanks, I hope my question makes sense as I am still learning databases


If it's safe to assume that you're starting with all relationships doubled up, e.g.

If A B is in the table, then B A is guaranteed to be in the table.

Then all you have to do is remove all rows where P2 < P1;

DELETE FROM `table_name` WHERE `P2` < `P1`;

If this isn't the case, you can make it the case by going through the table and inserting all the duplicate rows if they don't already exist, then running this.


I don't think it's necessary in your situation, but as an intellectual exercise, you could build on Jamie Wong's solution and prevent non-duplicated columns from being removed with an EXISTS clause. Something like this:

DELETE FROM `table_name` AS t1
  WHERE `P2` < `P1`
    AND EXISTS (SELECT NULL FROM `table_name` AS t2
      WHERE t1.`P1` = t2.`P2` AND t1.`P2` = t2.`P1`);

It pretty much just makes sure that there's a duplicate before deleting anything.

(My MySQL syntax might be a little off; it's been a while.)


Step 1 (as you've already done): Transform to Table2

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    A    y
B    B    y
B    C    n
C    A    y
C    B    n
C    C    y

Step 2: ReOrder Columns, Select Distinct

SELECT DISTINCT
   IF P1<P2 THEN P1 ELSE P2 END as P1, -- this puts the smallest value in P1
   IF P1>P2 THEN P1 ELSE P2 END as P2 -- this puts the largest value in P2
FROM Table2
WHERE NOT P1=P2  --(Assuming records like A, A, y are not interesting)

I'm not a mySQL guy, so you might need to check the if/then syntax, but this seems conceptually ok anyway.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜