Deleting redundant entries from MySQL table because of overlapping dates

2023-03-06 19:37 问答作者：

I have a MyISAM table of affiliations between organizations and individuals. Each record has a start and an end date. These records are added while processing large text files, so I don't do a lot of processing and cleaning as they are added in order to speed up the text parsing. However, some of the records are redundant or potentially redundant because they contain date ranges that overlap.

For instance, I could have the following:

aff_id  aff_e1_id  aff_e1_type  aff_e2_id  aff_e2_type  aff_start    aff_end
------  ---------  -----------  ---------  -----------  -----------  ----------
01       172        org            131       indiv      1997-01-22   1998-03-31
02       172        org            131       indiv      1997-01-22   1999-04-03
03       100        org            127       indiv      1995-01-02   2000-01-05
04       100        org            127       indiv      1994-01-24   1999-03-04

What I would like to do is combine the records which are redundant relationships and modify the date range to include any overlaps. For example, the first two and the last two records, respectively, could be combined and the dates modified to include both dates.

Is there a way to do this entirely within MySQL?

Edited: In response to comments below, the 2, 3, 4, 5 columns need t开发者_JAVA技巧o be identical, and then to check if the dates overlap (if they don't overlap at all, can just leave them alone).

A stored procedure would be great but is there a faster way than using a cursor to cycle through all the records and compare them one-on-one?

You can solve it with a series of delete/update statements:

Delete all ranges that are completely within another range
Update any ranges that have an end-date >= another range's start-date
Repeat (assuming you could have a series of rows that overlap for the same id) until your update statement doesn't update any rows

I think you could just keep doing the update over and do the delete once at the end, but depending on how much data and how many overlaps, that may not be ideal anyway.

Delete Statement:

DELETE sub
FROM tab AS sub 
INNER JOIN tab AS sup
  ON  sub.aff_e1_type = sup.aff_e1_type
  AND sub.aff_e2_type = sup.aff_e2_type
  AND sub.aff_e1_id = sup.aff_e1_id
  AND sub.aff_e2_id = sup.aff_e2_id
  AND ( ( sub.aff_start = sup.aff_start
     AND  sub.aff_end = sup.aff_end
     AND  sub.aff_id < sup.aff_id)
     OR ( sub.aff_start > sup.aff_start
     AND  sub.aff_end <= sup.aff_end
     AND  sub.aff_id <> sup.aff_id)
     OR ( sub.aff_start >= sup.aff_start
     AND  sub.aff_end < sup.aff_end
     AND  sub.aff_id <> sup.aff_id)
   )

Update Statement:

UPDATE tab AS row1 
INNER JOIN tab AS row2
  ON  row1.aff_e1_type = row2.aff_e1_type
  AND row1.aff_e2_type = row2.aff_e2_type
  AND row1.aff_e1_id = row2.aff_e1_id
  AND row1.aff_e2_id = row2.aff_e2_id
  AND row1.aff_end >= row2.aff_start
  AND row1.aff_start < row2.aff_start
  AND row1.aff_id <> row2.aff_id
SET    row1.aff_end = row2.aff_end

One way to do this is to create a new copy of the table, copying the data over with the new groupings you want, then rename the tables to replace the old table with the new table. If the table is very large you may be better off dumping the data to disk using SELECT ... INTO OUTFILE and then loading it into the new table using LOAD DATA INFILE.

Here's an example of the first approach I described:

CREATE TABLE your_table_new LIKE your_table;

INSERT INTO your_table_new(aff_id, aff_e1_id, aff_e1_type, aff_e2_id, aff_e2_type, 
  aff_start, aff_end)
SELECT NULL as aff_id, aff_e1_id, aff_e1_type, aff_e2_id, aff_e2_type, 
  MIN(aff_start), MAX(aff_end)
FROM your_table
GROUP BY aff_e1_id, aff_e1_type, aff_e2_id, aff_e2_type;

RENAME TABLE your_table TO your_table_old, 
  your_table_new TO your_table;

继续阅读：date sql

Deleting redundant entries from MySQL table because of overlapping dates

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？