Removing redundant data from mysql

2023-04-10 06:51 问答作者：

I have a table in sql which looks like:

        1   1w10    U   UROK_HUMAN  IIGGEFTTIENQPWFAAIYRRHRGGSVTYVCGGSLISPCWVISATHCFID...   
        2   1w11    U   UROK_HUMAN  IIGGEFTTIENQPWFAAIYRRHRGGSVTYVCGGSLISPCWVISATHCFID...   
        3   1w12    U   UROK_HUMAN  IIGGEFTTIENQPWFAAIYRRHRGGSVTYVCGGSLISPCWVISATHCFID...   
        4   1w13    U   UROK_HUMAN  IIGGEFTTIENQPWFAAIYRRHRGGSVTYVCGGSLISPCWVISATHCFID...   
        5   1w14    U   UROK_HUMAN  IIGGEFTTIENQPWFAAIYRRHRGGSVTYVCGGSLISPCWVISATHCFID...
        6  开发者_高级运维 1w15    A   SYT4_RAT    GSPGISGGGGGIPSGRGELLVSLCYQSTTNTLTVVVLKARHLPKSDVSGL...   
        7   1w16    A   SYT4_RAT    GSPGISGGGGGIPSGRGELLVSLCYQSTTNTLTVVVLKARHLPKSDVSGL...   
        8   1w17    B   PDAA_BACSU  MKWMCSICCAAVLLAGGAAQAEAVPNEPINWGFKRSVNHQPPDAGKQLNS...   
        10  1w18    B   SACB_ACEDI  AGVPGFPLPSIHTQQAYDPQSDFTARWTRADALQIKAHSDATVAAGQNSL...   
        12  1w18    E   SACB_ACEDI  MKGGAGVPDLPSLDASGVRLAIVASSWHGKICDALLDGARKVAAGCGLDD...

I want to remove duplicate entries but leaving one of them. For instance, I want to keep the first row but remove 2,3,4,5. In short, I want to remove rows which have same column 4 value (here, UROK-HUMAN) but have different col2 and col3 values( here 1w10, 1w11 etc (col2) and U,A,B(col3)).

However, I do not want to remove entry have same col2 and col3 (1w18 B-E) which have same col4 value(SACB-ACEDI).

How can I write and sql statement to delete those rows? I tried to write like and did not work:

SELECT pdb, chain, unp, sekans, COUNT(*) AS ct
FROM protein
JOIN (SELECT DISTINCT(unp)
          FROM protein 
          GROUP by pdb) protein2 ON protein2.unp = protein.unp;

Thank you very much for your help.

Consider the alternative route instead: selecting those unique rows and inserting them into a temp table, then drop the old one and rename the new one. This circumvents the limitations on deleting from a table you select on, and it makes it far easier to test wether the results are correct.

INSERT INTO newtable SELECT min(pdb), chain, unp, sekans
FROM protein
GROUP by chain, unp, sekans

Note that if you have other columns that could have a different value in different rows (like the pdb), you should use an aggregate function (like min, max, sum, group_concat), or else the value that mysql will use for the new row will be undefined.

I'm not writing it for you but I'll tell how to do that. First write a SELECT query that would return all the IDs (I guess it is the first column, right?) you want to delete.

Then write a DELETE statement that would delete all of the rows with those IDs. Something like:

DELETE from protein where pdb in
(SELECT pdb from protein #here_goes_the_query_im_not_writing#)

So, in short, you first get all the IDs you want to delete and then you tell the DBMs to delete those IDs. That's all.

EDIT: Just adding a possible SQL to get all the duplicated rows but one. Not tested.

SELECT pdb FROM protein WHERE
pdb not in (
    SELECT pdb FROM (
        SELECT sekans, pdb FROM protein
        GROUP BY sekans)
    as T);

继续阅读：redundancy

Removing redundant data from mysql

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？