Computing sum of consecutive values in a vector that are greater than a constant number?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Computing sum of consecutive values in a vector that are greater than a constant number?



I couldn't summarize my question in the title very well. I'm writing a code and in one part of the code I need to compute the following:



Let's say we have a vector (e.g. a numpy array):


a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]



We want to turn any number greater than 5 to 5:


a = [3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]



Then we compute the sum of consecutive 5s and the number that follows them and replace all these elements with the resulting sum:


a = [3.2, 4, 5+ 2, 5+ 5+ 5+ 1.7, 2, 5+ 5+ 1, 3]



so the resulting array would be:


a = [3.2, 4, 7, 16.7, 2, 11, 3]



I can do this using a for loop like this:


indx = np.where(a>5)[0]
a[indx] = 5
counter = 0
c =
while (counter < len(a)):
elem = a[counter]
if elem ~= 5:
c.append(elem)
else:
temp = 0
while(elem==5):
temp += elem
counter +=1
elem = a[counter]
temp += elem
c.append(temp)
counter += 1



Is there a way to avoid using the for loop? Perhaps by using the indx variable?



I have a vague idea if we turn it into a string:
a = '[3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]'
and then change anywhere we have ' 5,' with ' 5+' and then use eval(a). However, is there an efficient way to find all indices containing a sub-string? How about the fact that strings are immutable?


' 5,'


' 5+'


eval(a)





in numpy you can use np.where(a>=5) to get the indices
– apple apple
Aug 8 at 2:29



np.where(a>=5)




3 Answers
3



You can use pandas for data manipulation, using cumsum and shift to groupby your values with your logic, and aggregating it with sum


pandas


cumsum


shift


groupby


sum


df = pd.DataFrame(a, columns=['col1'])
df.loc[df.col1 > 5] = 5
s = df.col1.groupby((df.col1 != 5).cumsum().shift().fillna(0)).sum()

col1
0.0 3.2
1.0 4.0
2.0 7.0
3.0 16.7
4.0 2.0
5.0 11.0
6.0 3.0



To get a numpy back, just get .values


.values


>>> s.values
array([ 3.2, 4. , 7. , 16.7, 2. , 11. , 3. ])





Thank you! Can we also use this method if "a" is a 2D array and we want to compute this for individual rows and fillna with np.inf?
– Joe
Aug 8 at 21:58



This is what you want (all in vectorized numpy):


import numpy as np

a = np.array([0, 3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3, 0]) # add a 0 at the beginning and the end
aa = np.where(a>5, 5, a) # clip values to 5, can use np.clip(a, None, 5) too...
c = np.cumsum(aa) # get cumulative sum
np.diff(c[aa < 5]) # only keep values where original array is less than 5, then diff again

array([ 3.2, 4. , 7. , 16.7, 2. , 11. , 3. ])





Your answer fails for, e.g. arr = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3,5,2,5,5,5,5,5,5,5,5,5,5,10] or if arr finishes with 5
– RafaelC
Aug 8 at 2:27



arr = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3,5,2,5,5,5,5,5,5,5,5,5,5,10]


5





Is this the only case that it fails? What if we also add a "0" at the end of the array?
– Joe
Aug 8 at 2:41





I would use numpy.clip to clip.
– apple apple
Aug 8 at 2:42



numpy.clip





Yes just add a 0 at the end to cover that edge case..
– Julien
Aug 8 at 2:51






Note that the final 0 will show if your last value was <5, so this might need an extra check if you need it gone... Or maybe there is a cleaner less hacky way I didn't think of...
– Julien
Aug 8 at 3:10



I think you can do this in a single pass. For each item:



.


a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]
result =

current_sum = 0
for item in a:
if item < 5:
result.append(current_sum + item)
current_sum = 0
else:
current_sum += 5

if current_sum:
result.append(current_sum)

>>> result
[3.2, 4, 7, 16.7, 2, 11, 3]





Thanks but I'm not sure this is much different. I'm looking for a loop-free solution if it exists.
– Joe
Aug 8 at 2:21





I think any solution will require at least one pass over the array. Even if you were only looking to accomplish the first task, converting any number greater than 5 to 5, you would still need to evaluate each item once.
– chris
Aug 8 at 2:25






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard