Merge records in SQL with subsequent dates?

2023-02-12 22:33 问答作者：

I want to merge the following data using sql:

no code area rate startdate enddate  
1 开发者_如何学Python 0101 EU   0.1% 20050101  20051231  
2  0101 EU   0.1% 20060101  20061231
3  0101 EU   0.1% 20080101  20081231  
4  0101 EFTA 0.2% 20050101  20051231  
5  0101 EFTA 0.1% 20060101  20061231

In the above example no 1 and 2 should be merged as they have the same code, same rate, have subsequent dates and are from the same area.
No 3 should not be merged as the dates are not subsequent.
4 and 5 should not be merged as they have different rates.

The end result should look as follows:

no code area rate startdate enddate  
1  0101 EU   0.1% 20050101  20061231  
3  0101 EU   0.1% 20080101  20081231  
4  0101 EFTA 0.2% 20050101  20051231  
5  0101 EFTA 0.1% 20060101  20061231

Is there a way to do this using only sql? I am using postgres.

Thanks

Assuming this is a one-off hit thing you want to do, and isn't part of some on-going production solution, I think the only way you'll do this is with a cursor.

I don't know Postgres, but it seems they have cursors which work in a similar way to MS-Sql: http://www.postgresql.org/docs/current/static/plpgsql-cursors.html

Cursors allow you to "loop through" a select statement, putting the values into temporary variables where you can then work on them. It's more akin to procedural programming.

Performance isn't great, which is why it's OK as a one-off thing, but there may be a more complex (but more performant) solution if you have performance issues.

I would start by creating a cursor to loop through the table, putting the code,rate, startdate and endate into variables. (If you sort the select statement by code,rate and startdate you'll guarantee that all your records will appear in the right order for you.) For each record you can compare the current code and rate to the value in the variable, and if they're the same, update the end date variable.

Use a 2nd table (or a table variable) for the results. If the code/rate isn't the same as the variables, then everything in the variables is a new row candidiate. Write it out, then update the variables with the new data and continue.

In one pass you'll have created your new table. You can either then delete the original one and replace, or store it elsewhere.

Maybe I'm missing something, but to me it seems that you want:

SELECT DISTINCT ON (code, area, rate, startdate, enddate), 
       no, 
       code, 
       area, 
       rate, 
       startdate, 
       enddate  
FROM your_table

I have faced the same problem. The approach I use it to 'divide' the ordered list in 'groups' of rows having subsequent dates, by marking the first ('original') row of the 'group' which is the one that is not subsequent to its predecessor, and then finding the list row of each 'group' - end date of such last row is the 'aggregated' end date.

Create additional table with new column Subsequent (int). Copy all rows from original table into this one filling new column 'Subsequent' as follows: if start date of the row is equal to end date of previous row, then Subsequent = 1 (means that this row is subsequent to previous one), if not equal then 0 (means that this row starts new sequence). In order to find end date of previous row use lag() function partitioned by code and area and ordered by start date. For instance in the above mentioned example line no. 2 will have subsequent = 1 and all others subsequent = 0.
Write a query which will select from above table only rows where Subsequent = 0, i.e. 'original' ones, with code and rate, and then find the maximum end date among all subsequent rows following after the original row, for each original row selected.

Maximum end date can be found using subquery like this:

(A) Try to find it among subsequent rows following this original one:

select maximum end date from all rows which meet criteria:

subsequent is 1, i.e. only subsequent rows
code equals to code of the original row
rate equals to code of the original row
start date greater than start date of original row
end date less than start date of next original row, or, if there is no next original row, then end date = max of all end dates over this combination of code and rate"

Next original row can be found by another subquery: select all rows where

subsequent = 0
start date > start date of original row
code and rate are the same as in original row.

(B) If above returned null, i.e. the original row has no subsequent rows, then end date for this row is its own end date.

Coalesce function make a choice between (A) and (B) nicely.

As the result of the query you should have a list which you would like to get in your example.

继续阅读：date postgresql sql

Merge records in SQL with subsequent dates?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？