How to efficiently determine the interval a given value is in?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to efficiently determine the interval a given value is in?



Consider the following zone definitions for a value x:


x



I am looking for an efficient way to determine the zone for given values of x. I came up with:


x


borders = (1, 10, 112)

tst_values = (.2, 2, 22, 222)
for x in tst_values:
z = next((i for b, i in zip(borders, 'ABC') if x < b), 'D')
print(f" * Value x:3g is in zone 'z'.")

# The output is:
# * Value 0.2 is in zone 'A'.
# * Value 2 is in zone 'B'.
# * Value 22 is in zone 'C'.
# * Value 222 is in zone 'D'.



What are the best practices for solving such a problem, especially if the number of zones, i.e., len(borders) is large. borders may contain an arbitrary list of (increasing) floats.


len(borders)


borders



Update
Rephrased question a little bit to address comments.





This is a simple logarithmic problem. Take the log of the number to get the digits..
– Mr. Polywhirl
Aug 10 at 14:17





@Mr.Polywhirl There's no pattern to the zones - the borders may be an arbitrary list of increasing numbers. I updated the question, to make that clear.
– Dietrich
Aug 10 at 14:21





"especially if the number borders is large" - Do you mean if you have many borders? Or do you mean the value of not so many borders is large?
– Leon
Aug 10 at 14:21





@Dietrich are the test values always increasing in nature ?
– Albin Paul
Aug 10 at 14:22





@Leon The number of zones are large. The value of the zones can be represented by normal floats.
– Dietrich
Aug 10 at 14:24




3 Answers
3



To effectively find the range you are in a long list of sorted borders, I would suggest using a binary search. Basically, you look in the middle of your array of borders, check if you are to big/small. Depending on that, you check the value at one/three quarters of the border-array's length. And so on.



This can easily be implemented using recursion by hand. Alternatively, you can use numpy's implementation numpy.digitize, which YSelf pointed out. The latter will almost certainly be faster than the former.



When using numpy.digitize, your code would become:


numpy.digitize


borders = (1, 10, 112)
values = (.2, 2, 22, 222)

for x in values:
z = 'ABCD'[numpy.digitize(x, borders)]
print(f" * Value x:3g is in zone 'z'.")





numpy.digitize implements this binary search.
– YSelf
Aug 10 at 14:30





Was not aware of np.digitize() - that solves my problem nicely.
– Dietrich
Aug 11 at 9:51


np.digitize()



An efficient way to do this would be to do a binary search through the borders. Python has a build in bisect library that will do this for you:


bisect


import bisect

borders = (1, 10, 112)
tst_values = (.2, 2, 22, 222)
for x in tst_values:
border_right = bisect.bisect_right(borders, x) # x < b
# or
border_left = bisect.bisect_left(borders, x) # x > b



This will give you the index into borders that the value x falls into. You can then convert that index into a letter if you like.





Based on the usage of < and <= in OP's example, we wants bisect_left, or just bisect, which is an alias for bisect_left.
– Felk
Aug 10 at 14:39



<


<=


bisect_left


bisect


bisect_left





Never used the bisect library before, nice pointer.
– Dietrich
Aug 11 at 9:52



If you want efficient way to solve this problem, use numpy vectorized operations using np.greater_than and np.less with outer, then taking index values with np.take


np.greater_than


np.less


outer


np.take


a = np.greater_equal.outer(tst_values, [-np.inf, 1,10,112]) &
np.less.outer(tst_values, [1, 10,112, np.inf])

np.take(list('ABCD'), np.argmax(a, axis=1))



This works pretty efficiently. For 5 MM rows, it runs in less than a sec


%timeit (...)
922 ms ± 8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)





Thanks for the benchmarks - it's the second choice after np.digitize() for me.
– Dietrich
Aug 11 at 9:53


np.digitize()






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

Firebase Auth - with Email and Password - Check user already registered