Find most common occurrence of integer within nested lists
Clash Royale CLAN TAG#URR8PPP
Find most common occurrence of integer within nested lists
I want to find the integer that occurs most commonly within a nested list, and return the integer along with its number of occurrences (and multiple integers and their occurrences where multiple integers occur the same number of times). The data is currently in the following form:
list_of_lists =
[[11, 53],
[2, 137],
[2, 7, 31],
[2, 2, 7, 31],
[3, 3, 3, 29],
[2, 2, 2, 3, 137],
[2, 2, 7, 31],
[11, 53]]
Therefore, the desired output would be [[3, 3], [2, 3]]
, number 3 occurred 3 times in the fifth nested list, and number 2 occurred 3 times in the sixth nested list.
[[3, 3], [2, 3]]
Neither the list nor the lists within the list are of fixed length. Therefore a program that solves this for variable lengths would be greatly appreciated!
I couldn't find a similar question directly on point.
Thanks!
4 Answers
4
You can use collections.Counter
to count the occurrence of elements in each list and then, sort the resulting list based on occurrence in reverse order, then group the results (using itertools.groupby
) to get all results with same maximum value
collections.Counter
itertools.groupby
>>> from itertools import chain, groupby
>>> from collections import Counter
>>>
>>> ll = [[11, 53], [2, 137], [2, 7, 31], [2, 2, 7, 31], [3, 3, 3, 29], [2, 2, 2, 3, 137], [2, 2, 7, 31], [11, 53]]
>>>
>>> f = lambda t: t[1]
>>> list(next(groupby(sorted(chain(*(Counter(l).items() for l in ll)), key=f, reverse=True), f))[1])
[(3, 3), (2, 3)]
I used a slightly more complicated list for testing: some values repeated twice, some 3 times, appearing in the same and in different sublists.
We use a Counter in each sublist, and keep a dictionary of the highest count we see for each value. At the end, we build the output list, keeping only the values that were repeated the highest number of times in each line.
list_of_lists =[[11, 11, 53], # 11 is repeated 2 times,
[2, 137], # it shouldn't appear in the result
[2, 7, 31],
[2, 2, 7, 31],
[3, 3, 3, 4, 4, 4, 5, 5, 5, 29], # 3 times 3, 4 and 5
[2, 2, 2, 3, 137], # and 3 times 2
[2, 2, 7, 31],
[11, 53]]
from collections import Counter, defaultdict
def maxcount(list_of_lists):
out = defaultdict(int)
max_repetitions = 0
for sublist in list_of_lists:
for value, count in Counter(sublist).items():
if count > 1 and count > out[value]:
out[value] = count
if count > max_repetitions:
max_repetitions = count
return([[val, count] for val, count in out.items() if count == max_repetitions])
print(maxcount(list_of_lists))
# [[2, 3], [3, 3], [4, 3], [5, 3]]
I like itertools
, so I was curious to compare @Sunitha's solution with this one.
itertools
This solution:
*%timeit maxcount(list_of_lists)
# 65 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
@Sunitha's solution with more use of itertools:
from itertools import chain, groupby
from collections import Counter
def maxcount_with_itertools(ll):
f = lambda t: t[1]
return list(next(groupby(sorted(chain(*(Counter(l).items() for l in ll)), key=f, reverse=True), f))[1])
%timeit maxcount_with_itertools(list_of_lists)
# 70.9 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
which is just a little bit slower.
If you're interested in doing it using pure Python, then there is the following way:
list_of_lists = [[11, 53],[2, 137],[2, 7, 31],[2, 2, 7, 31],[3, 3, 3, 29],[2, 2, 2, 3, 137],[2, 2, 7, 31],[11, 53]]
maxOccurences = [max([[elem,sublist.count(elem),index] for elem in sublist], key=lambda i:sublist.count(i)) for index, sublist in enumerate(list_of_lists)]
maximum = max(maxOccurences, key=lambda i: i[1])
elements = [elem[:2] for elem in maxOccurences if elem[1]==maximum[1]]
print(elements)
Output:
[[3, 3], [2, 3]]
Another suggestion would be the following:
list_of_lists = [[11, 53],[2, 137],[2, 7, 31],[2, 2, 7, 31],[3, 3, 3, 29],[2, 2, 2, 3, 137],[2, 2, 7, 31],[11, 53]]
maximum = max([max([[elem,sublist.count(elem)] for elem in sublist], key=lambda i:sublist.count(i)) for sublist in list_of_lists], key=lambda i: i[1])
elements = [[elem,sublist.count(elem)] for sublist in list_of_lists for elem in set(sublist) if sublist.count(elem)==maximum[1]]
print(elements)
Output:
[[3, 3], [2, 3]]
You can use collections.Counter
, split into 3 steps:
collections.Counter
Counter
map
max
Counter
Here's a demo.
from collections import Counter
counters = list(map(Counter, list_of_lists))
most_common_count = max(i.most_common(1)[0][1] for i in counters)
res = [(k, v) for i in counters for k, v in i.items() if v == most_common_count]
print(res)
[(3, 3), (2, 3)]
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Improved formatting and punctuation
– Eugene Primako
Aug 12 at 10:09