Pandas first date condition is met while another condition is active

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Pandas first date condition is met while another condition is active



I have a dataframe with a time series of scores. My goal is to detect when the score is larger than a certain threshold th and then to find when the score goes back to 0. Is quite easy to find each condition separately


th


dates_1 = score > th
dates_2 = np.sign(score[1:]) == np.sign(score.shift(1).dropna())



However, I don't know what's the most pythonic way to override dates_2 so that only dates when an 'active' date_1 has been observed



Perhaps using an auxiliary column 'active' set to 1 whenever score > th is True and set it to False when the condition for dates_2 is met. That way I can ask for the change in sign AND active == True. However, that approach requires iteration and I'm wondering if there's a vectorized solution to my problem


score > th


active == True



Any thoughts on how to improve my approach?



Sample data:


date score
2010-01-04 0.0
2010-01-05 -0.3667779798467592
2010-01-06 -1.9641427199568868
2010-01-07 -0.49976215445519134
2010-01-08 -0.7069108074548405
2010-01-11 -1.4624766212523337
2010-01-12 -0.9132777669357441
2010-01-13 0.16204588193577152
2010-01-14 0.958085568609925
2010-01-15 1.4683022129399834
2010-01-19 3.036016680985081
2010-01-20 2.2357911432637345
2010-01-21 2.8827438241030707
2010-01-22 -3.395977874791837



if th = 0.94


date active
2010-01-04 False
2010-01-05 False
2010-01-06 False
2010-01-07 False
2010-01-08 False
2010-01-11 False
2010-01-12 False
2010-01-13 False
2010-01-14 True
2010-01-15 True
2010-01-19 True
2010-01-20 True
2010-01-21 True
2010-01-22 False





What is the expected outcome given the data in your example?
– piRSquared
Aug 8 at 19:27




2 Answers
2


def alt_cond(s, th):
active = False
for x in s:
active = [x >= th, x > 0][int(active)]
yield active

df.assign(A=[*alt_cond(df.score, 0.94)])

date score A
0 2010-01-04 0.000000 False
1 2010-01-05 -0.366778 False
2 2010-01-06 -1.964143 False
3 2010-01-07 -0.499762 False
4 2010-01-08 -0.706911 False
5 2010-01-11 -1.462477 False
6 2010-01-12 -0.913278 False
7 2010-01-13 0.162046 False
8 2010-01-14 0.958086 True
9 2010-01-15 1.468302 True
10 2010-01-19 3.036017 True
11 2010-01-20 2.235791 True
12 2010-01-21 2.882744 True
13 2010-01-22 -3.395978 False



I used Numba to really speed things up. Still a loop but should be very fast if you can install numba


numba


from numba import njit

@njit
def alt_cond(s, th):
active = False
out = np.zeros(len(s), dtype=np.bool8)
for i, x in enumerate(s):
if active:
if x <= 0:
active = False
else:
if x >= th:
active = True
out[i] = active
return out

df.assign(A=alt_cond(df.score.values, .94))



You can have a dictionary of column names and threshold values and iterate


th = 'score': 0.94

df.join(pd.DataFrame(
np.column_stack([[*alt_cond(df[k], v)] for k, v in th.items()]),
df.index, [f"k_A" for k in th]
))


date score score_A
0 2010-01-04 0.000000 False
1 2010-01-05 -0.366778 False
2 2010-01-06 -1.964143 False
3 2010-01-07 -0.499762 False
4 2010-01-08 -0.706911 False
5 2010-01-11 -1.462477 False
6 2010-01-12 -0.913278 False
7 2010-01-13 0.162046 False
8 2010-01-14 0.958086 True
9 2010-01-15 1.468302 True
10 2010-01-19 3.036017 True
11 2010-01-20 2.235791 True
12 2010-01-21 2.882744 True
13 2010-01-22 -3.395978 False





Tried the non vectorized, works great!
– Yuca
Aug 8 at 20:36





If ever in NY, beers on me :)
– Yuca
Aug 8 at 20:41





(-: no worries. Glad you found it.
– piRSquared
Aug 9 at 13:27





Cleaning up comments.
– piRSquared
Aug 9 at 13:27





I updated the post with one way to do it.
– piRSquared
Aug 14 at 19:21



I'm assuming your data is in a pandas dataframe, and 'date' is your index column. Then this would be the way I'd do it:


th = 0.94 # Threshold value
i = df[df.score>th].index[0] # Check the index for the first condition

df[i:][df.score<0].index[0] # Check the index for the second condition, after the index of the first condition



So use conditional indexing to find the index for the first condition ([df.score>th]), then check for the second condition ([df.score<0]), but begin to look from the index found for the first condition ([i:])


[df.score>th]


[df.score<0]


[i:]





I ran your code and it doesn't do anything to the original input a, can you include your results?
– Yuca
Aug 8 at 19:49





This can be due to the random data. If none of the data yields a value greater than the threshold value, no output is generated
– Jeroen
Aug 8 at 19:50





I get FutureWarning; 'summary' is deprecated (no idea why this is getting triggered tbh)
– Yuca
Aug 8 at 19:50





I'm running it multiple times, it still not doing anything. Also why generate random data vs loading the dataframe from the clipboard?
– Yuca
Aug 8 at 19:51





As you should! pd.read_clipboard()
– Yuca
Aug 8 at 20:20






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard