Merge rows that have time overlapping - Pandas

Clash Royale CLAN TAG#URR8PPP
Merge rows that have time overlapping - Pandas
Supposedly, I have dataframe like below.
Date Time_Start Time_End Reason
0 2018-08-05 2018-08-05 10:10:00 2018-08-05 13:35:00 blah1
1 2018-08-05 2018-08-05 12:50:00 2018-08-05 14:26:00 blah2
2 2018-08-05 2018-08-05 16:40:00 2018-08-05 17:30:00 blah3
3 2018-08-05 2018-08-05 17:00:00 2018-08-05 17:10:00 blah4
4 2018-08-06 2018-08-06 09:40:00 2018-08-06 11:23:00 blah5
I would like to know if pandas is capable of merging rows that have time overlapped e.g. take "Time_Start" of row one and merge with "Time_End" of second row and append the reason to a list. Another case is to remove the forth row as it is a sub time window of the third row and appending the reasons to a list. So, my expected output would look like this
Date Time_Start Time_End Reason
0 2018-08-05 2018-08-05 10:10:00 2018-08-05 14:26:00 [blah1, blah2]
1 2018-08-05 2018-08-05 16:40:00 2018-08-05 17:30:00 [blah3, blah4]
3 2018-08-06 2018-08-06 09:40:00 2018-08-06 11:23:00 [blah5]
Note that there can be more than two rows of data which have time overlapping.
1 Answer
1
You can create a help key here
df['newid']=(df['Time_Start']-df['Time_End'].shift()).dt.total_seconds().gt(0).cumsum()
newdf=df.groupby(['Date','newid']).agg('Time_Start':'min','Time_End':'max','Reason':lambda x : [x.values.tolist()]).reset_index(level=1)
newdf.assign(Reason=newdf.Reason.str[0])
Out[592]:
newid Time_Start Reason Time_End
Date
2018-08-05 0 2018-08-05 10:10:00 [blah1, blah2] 2018-08-05 14:26:00
2018-08-05 1 2018-08-05 16:40:00 [blah3, blah4] 2018-08-05 17:30:00
2018-08-06 2 2018-08-06 09:40:00 [blah5] 2018-08-06 11:23:00
@bensw updated :-)
– Wen
Aug 8 at 4:20
Very splendid. You are the fast man alive. Thank you ;)
– bensw
Aug 8 at 4:24
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thank you for your attempt. It looks very promising. However, the "Time_End" of the second row in the result dataframe should be "2018-08-05 17:30:00".
– bensw
Aug 8 at 4:18