Tricky add consecutive date row based on category in Pandas
I would like to add a new consecutive date row at the end of a new category based on a column. Ex. based on the Area column.
Data
Start End Area ID Stat
2/1/2022 3/1/2022 NY 222 Y
3/1/2022 4/1/2022 NY 111 Y
2/1/2022 3/1/2022 CA 333 Y
3/1/2022 4/1/2022 CA 100 Y
开发者_如何学Python
Desired
Start End Area ID Stat
2/1/2022 3/1/2022 NY 222 Y
3/1/2022 4/1/2022 NY 111 Y
4/1/2022 5/1/2022 NY
2/1/2022 3/1/2022 CA 333 Y
3/1/2022 4/1/2022 CA 100 Y
4/1/2022 5/1/2022 CA
Doing
I am thinking I can use the append method:
df1 = df.append(new_row, ignore_index=True)
Any suggestion is appreciated. I am currently researching this.
Annotated code
# Convert the cols to datetime
c = ['Start', 'End']
df[c] = df[c].apply(pd.to_datetime, dayfirst=True)
# drop the duplicates rows by Area while keeping only the last row
rows = df[[*c, 'Area']].drop_duplicates('Area', keep='last')
# Add a dateoffset of 1 day
rows[c] += pd.DateOffset(days=1)
# Concat the rows and sort index to maintain order
pd.concat([df, rows]).sort_index(ignore_index=True)
Result
Start End Area ID Stat
0 2022-01-02 2022-01-03 NY 222.0 Y
1 2022-01-03 2022-01-04 NY 111.0 Y
2 2022-01-04 2022-01-05 NY NaN NaN
3 2022-01-02 2022-01-03 CA 333.0 Y
4 2022-01-03 2022-01-04 CA 100.0 Y
5 2022-01-04 2022-01-05 CA NaN NaN
精彩评论