How to efficiently determine the interval a given value is in?

Clash Royale CLAN TAG#URR8PPP
How to efficiently determine the interval a given value is in?
Consider the following zone definitions for a value x:
x
I am looking for an efficient way to determine the zone for given values of x. I came up with:
x
borders = (1, 10, 112)
tst_values = (.2, 2, 22, 222)
for x in tst_values:
z = next((i for b, i in zip(borders, 'ABC') if x < b), 'D')
print(f" * Value x:3g is in zone 'z'.")
# The output is:
# * Value 0.2 is in zone 'A'.
# * Value 2 is in zone 'B'.
# * Value 22 is in zone 'C'.
# * Value 222 is in zone 'D'.
What are the best practices for solving such a problem, especially if the number of zones, i.e., len(borders) is large. borders may contain an arbitrary list of (increasing) floats.
len(borders)
borders
Update
Rephrased question a little bit to address comments.
@Mr.Polywhirl There's no pattern to the zones - the borders may be an arbitrary list of increasing numbers. I updated the question, to make that clear.
– Dietrich
Aug 10 at 14:21
"especially if the number borders is large" - Do you mean if you have many borders? Or do you mean the value of not so many borders is large?
– Leon
Aug 10 at 14:21
@Dietrich are the test values always increasing in nature ?
– Albin Paul
Aug 10 at 14:22
@Leon The number of zones are large. The value of the zones can be represented by normal floats.
– Dietrich
Aug 10 at 14:24
3 Answers
3
To effectively find the range you are in a long list of sorted borders, I would suggest using a binary search. Basically, you look in the middle of your array of borders, check if you are to big/small. Depending on that, you check the value at one/three quarters of the border-array's length. And so on.
This can easily be implemented using recursion by hand. Alternatively, you can use numpy's implementation numpy.digitize, which YSelf pointed out. The latter will almost certainly be faster than the former.
When using numpy.digitize, your code would become:
numpy.digitize
borders = (1, 10, 112)
values = (.2, 2, 22, 222)
for x in values:
z = 'ABCD'[numpy.digitize(x, borders)]
print(f" * Value x:3g is in zone 'z'.")
numpy.digitize implements this binary search.
– YSelf
Aug 10 at 14:30
Was not aware of
np.digitize() - that solves my problem nicely.– Dietrich
Aug 11 at 9:51
np.digitize()
An efficient way to do this would be to do a binary search through the borders. Python has a build in bisect library that will do this for you:
bisect
import bisect
borders = (1, 10, 112)
tst_values = (.2, 2, 22, 222)
for x in tst_values:
border_right = bisect.bisect_right(borders, x) # x < b
# or
border_left = bisect.bisect_left(borders, x) # x > b
This will give you the index into borders that the value x falls into. You can then convert that index into a letter if you like.
Based on the usage of
< and <= in OP's example, we wants bisect_left, or just bisect, which is an alias for bisect_left.– Felk
Aug 10 at 14:39
<
<=
bisect_left
bisect
bisect_left
Never used the bisect library before, nice pointer.
– Dietrich
Aug 11 at 9:52
If you want efficient way to solve this problem, use numpy vectorized operations using np.greater_than and np.less with outer, then taking index values with np.take
np.greater_than
np.less
outer
np.take
a = np.greater_equal.outer(tst_values, [-np.inf, 1,10,112]) &
np.less.outer(tst_values, [1, 10,112, np.inf])
np.take(list('ABCD'), np.argmax(a, axis=1))
This works pretty efficiently. For 5 MM rows, it runs in less than a sec
%timeit (...)
922 ms ± 8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Thanks for the benchmarks - it's the second choice after
np.digitize() for me.– Dietrich
Aug 11 at 9:53
np.digitize()
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
This is a simple logarithmic problem. Take the log of the number to get the digits..
– Mr. Polywhirl
Aug 10 at 14:17