Pandas first date condition is met while another condition is active

I have a dataframe with a time series of scores. My goal is to detect when the score is larger than a certain threshold th and then to find when the score goes back to 0. Is quite easy to find each condition separately

th

dates_1 = score > th dates_2 = np.sign(score[1:]) == np.sign(score.shift(1).dropna())

However, I don't know what's the most pythonic way to override dates_2 so that only dates when an 'active' date_1 has been observed

Perhaps using an auxiliary column 'active' set to 1 whenever score > th is True and set it to False when the condition for dates_2 is met. That way I can ask for the change in sign AND active == True. However, that approach requires iteration and I'm wondering if there's a vectorized solution to my problem

score > th

active == True

Any thoughts on how to improve my approach?

Sample data:

date score 2010-01-04 0.0 2010-01-05 -0.3667779798467592 2010-01-06 -1.9641427199568868 2010-01-07 -0.49976215445519134 2010-01-08 -0.7069108074548405 2010-01-11 -1.4624766212523337 2010-01-12 -0.9132777669357441 2010-01-13 0.16204588193577152 2010-01-14 0.958085568609925 2010-01-15 1.4683022129399834 2010-01-19 3.036016680985081 2010-01-20 2.2357911432637345 2010-01-21 2.8827438241030707 2010-01-22 -3.395977874791837

if th = 0.94

date active 2010-01-04 False 2010-01-05 False 2010-01-06 False 2010-01-07 False 2010-01-08 False 2010-01-11 False 2010-01-12 False 2010-01-13 False 2010-01-14 True 2010-01-15 True 2010-01-19 True 2010-01-20 True 2010-01-21 True 2010-01-22 False

What is the expected outcome given the data in your example?
– piRSquared
Aug 8 at 19:27

2 Answers
2

def alt_cond(s, th): active = False for x in s: active = [x >= th, x > 0][int(active)] yield active df.assign(A=[*alt_cond(df.score, 0.94)]) date score A 0 2010-01-04 0.000000 False 1 2010-01-05 -0.366778 False 2 2010-01-06 -1.964143 False 3 2010-01-07 -0.499762 False 4 2010-01-08 -0.706911 False 5 2010-01-11 -1.462477 False 6 2010-01-12 -0.913278 False 7 2010-01-13 0.162046 False 8 2010-01-14 0.958086 True 9 2010-01-15 1.468302 True 10 2010-01-19 3.036017 True 11 2010-01-20 2.235791 True 12 2010-01-21 2.882744 True 13 2010-01-22 -3.395978 False

I used Numba to really speed things up. Still a loop but should be very fast if you can install numba

numba

from numba import njit @njit def alt_cond(s, th): active = False out = np.zeros(len(s), dtype=np.bool8) for i, x in enumerate(s): if active: if x <= 0: active = False else: if x >= th: active = True out[i] = active return out df.assign(A=alt_cond(df.score.values, .94))

You can have a dictionary of column names and threshold values and iterate

th = 'score': 0.94 df.join(pd.DataFrame( np.column_stack([[*alt_cond(df[k], v)] for k, v in th.items()]), df.index, [f"k_A" for k in th] )) date score score_A 0 2010-01-04 0.000000 False 1 2010-01-05 -0.366778 False 2 2010-01-06 -1.964143 False 3 2010-01-07 -0.499762 False 4 2010-01-08 -0.706911 False 5 2010-01-11 -1.462477 False 6 2010-01-12 -0.913278 False 7 2010-01-13 0.162046 False 8 2010-01-14 0.958086 True 9 2010-01-15 1.468302 True 10 2010-01-19 3.036017 True 11 2010-01-20 2.235791 True 12 2010-01-21 2.882744 True 13 2010-01-22 -3.395978 False

Tried the non vectorized, works great!
– Yuca
Aug 8 at 20:36

If ever in NY, beers on me :)
– Yuca
Aug 8 at 20:41

(-: no worries. Glad you found it.
– piRSquared
Aug 9 at 13:27

Cleaning up comments.
– piRSquared
Aug 9 at 13:27

I updated the post with one way to do it.
– piRSquared
Aug 14 at 19:21

I'm assuming your data is in a pandas dataframe, and 'date' is your index column. Then this would be the way I'd do it:

th = 0.94 # Threshold value i = df[df.score>th].index[0] # Check the index for the first condition df[i:][df.score<0].index[0] # Check the index for the second condition, after the index of the first condition

So use conditional indexing to find the index for the first condition ([df.score>th]), then check for the second condition ([df.score<0]), but begin to look from the index found for the first condition ([i:])

[df.score>th]

[df.score<0]

[i:]

I ran your code and it doesn't do anything to the original input a, can you include your results?
– Yuca
Aug 8 at 19:49

This can be due to the random data. If none of the data yields a value greater than the threshold value, no output is generated
– Jeroen
Aug 8 at 19:50

I get FutureWarning; 'summary' is deprecated (no idea why this is getting triggered tbh)
– Yuca
Aug 8 at 19:50

I'm running it multiple times, it still not doing anything. Also why generate random data vs loading the dataframe from the clipboard?
– Yuca
Aug 8 at 19:51

As you should! pd.read_clipboard()
– Yuca
Aug 8 at 20:20

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Pandas first date condition is met while another condition is active

Pandas first date condition is met while another condition is active

2 Answers
2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

Pandas first date condition is met while another condition is active

Pandas first date condition is met while another condition is active

2 Answers 2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

2 Answers
2