Computing sum of consecutive values in a vector that are greater than a constant number?
Clash Royale CLAN TAG#URR8PPP
Computing sum of consecutive values in a vector that are greater than a constant number?
I couldn't summarize my question in the title very well. I'm writing a code and in one part of the code I need to compute the following:
Let's say we have a vector (e.g. a numpy array):
a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]
We want to turn any number greater than 5 to 5:
a = [3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]
Then we compute the sum of consecutive 5s and the number that follows them and replace all these elements with the resulting sum:
a = [3.2, 4, 5+ 2, 5+ 5+ 5+ 1.7, 2, 5+ 5+ 1, 3]
so the resulting array would be:
a = [3.2, 4, 7, 16.7, 2, 11, 3]
I can do this using a for loop like this:
indx = np.where(a>5)[0]
a[indx] = 5
counter = 0
c =
while (counter < len(a)):
elem = a[counter]
if elem ~= 5:
c.append(elem)
else:
temp = 0
while(elem==5):
temp += elem
counter +=1
elem = a[counter]
temp += elem
c.append(temp)
counter += 1
Is there a way to avoid using the for loop? Perhaps by using the indx variable?
I have a vague idea if we turn it into a string:
a = '[3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]'
and then change anywhere we have ' 5,'
with ' 5+'
and then use eval(a)
. However, is there an efficient way to find all indices containing a sub-string? How about the fact that strings are immutable?
' 5,'
' 5+'
eval(a)
np.where(a>=5)
3 Answers
3
You can use pandas
for data manipulation, using cumsum
and shift
to groupby
your values with your logic, and aggregating it with sum
pandas
cumsum
shift
groupby
sum
df = pd.DataFrame(a, columns=['col1'])
df.loc[df.col1 > 5] = 5
s = df.col1.groupby((df.col1 != 5).cumsum().shift().fillna(0)).sum()
col1
0.0 3.2
1.0 4.0
2.0 7.0
3.0 16.7
4.0 2.0
5.0 11.0
6.0 3.0
To get a numpy back, just get .values
.values
>>> s.values
array([ 3.2, 4. , 7. , 16.7, 2. , 11. , 3. ])
Thank you! Can we also use this method if "a" is a 2D array and we want to compute this for individual rows and fillna with np.inf?
– Joe
Aug 8 at 21:58
This is what you want (all in vectorized numpy):
import numpy as np
a = np.array([0, 3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3, 0]) # add a 0 at the beginning and the end
aa = np.where(a>5, 5, a) # clip values to 5, can use np.clip(a, None, 5) too...
c = np.cumsum(aa) # get cumulative sum
np.diff(c[aa < 5]) # only keep values where original array is less than 5, then diff again
array([ 3.2, 4. , 7. , 16.7, 2. , 11. , 3. ])
Your answer fails for, e.g.
arr = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3,5,2,5,5,5,5,5,5,5,5,5,5,10]
or if arr finishes with 5
– RafaelC
Aug 8 at 2:27
arr = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3,5,2,5,5,5,5,5,5,5,5,5,5,10]
5
Is this the only case that it fails? What if we also add a "0" at the end of the array?
– Joe
Aug 8 at 2:41
I would use
numpy.clip
to clip.– apple apple
Aug 8 at 2:42
numpy.clip
Yes just add a 0 at the end to cover that edge case..
– Julien
Aug 8 at 2:51
Note that the final 0 will show if your last value was <5, so this might need an extra check if you need it gone... Or maybe there is a cleaner less hacky way I didn't think of...
– Julien
Aug 8 at 3:10
I think you can do this in a single pass. For each item:
.
a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]
result =
current_sum = 0
for item in a:
if item < 5:
result.append(current_sum + item)
current_sum = 0
else:
current_sum += 5
if current_sum:
result.append(current_sum)
>>> result
[3.2, 4, 7, 16.7, 2, 11, 3]
Thanks but I'm not sure this is much different. I'm looking for a loop-free solution if it exists.
– Joe
Aug 8 at 2:21
I think any solution will require at least one pass over the array. Even if you were only looking to accomplish the first task, converting any number greater than 5 to 5, you would still need to evaluate each item once.
– chris
Aug 8 at 2:25
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
in numpy you can use
np.where(a>=5)
to get the indices– apple apple
Aug 8 at 2:29