Find most common occurrence of integer within nested lists

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Find most common occurrence of integer within nested lists



I want to find the integer that occurs most commonly within a nested list, and return the integer along with its number of occurrences (and multiple integers and their occurrences where multiple integers occur the same number of times). The data is currently in the following form:


list_of_lists =
[[11, 53],
[2, 137],
[2, 7, 31],
[2, 2, 7, 31],
[3, 3, 3, 29],
[2, 2, 2, 3, 137],
[2, 2, 7, 31],
[11, 53]]



Therefore, the desired output would be [[3, 3], [2, 3]], number 3 occurred 3 times in the fifth nested list, and number 2 occurred 3 times in the sixth nested list.


[[3, 3], [2, 3]]



Neither the list nor the lists within the list are of fixed length. Therefore a program that solves this for variable lengths would be greatly appreciated!



I couldn't find a similar question directly on point.



Thanks!





Improved formatting and punctuation
– Eugene Primako
Aug 12 at 10:09




4 Answers
4



You can use collections.Counter to count the occurrence of elements in each list and then, sort the resulting list based on occurrence in reverse order, then group the results (using itertools.groupby) to get all results with same maximum value


collections.Counter


itertools.groupby


>>> from itertools import chain, groupby
>>> from collections import Counter
>>>
>>> ll = [[11, 53], [2, 137], [2, 7, 31], [2, 2, 7, 31], [3, 3, 3, 29], [2, 2, 2, 3, 137], [2, 2, 7, 31], [11, 53]]
>>>
>>> f = lambda t: t[1]
>>> list(next(groupby(sorted(chain(*(Counter(l).items() for l in ll)), key=f, reverse=True), f))[1])
[(3, 3), (2, 3)]



I used a slightly more complicated list for testing: some values repeated twice, some 3 times, appearing in the same and in different sublists.



We use a Counter in each sublist, and keep a dictionary of the highest count we see for each value. At the end, we build the output list, keeping only the values that were repeated the highest number of times in each line.


list_of_lists =[[11, 11, 53], # 11 is repeated 2 times,
[2, 137], # it shouldn't appear in the result
[2, 7, 31],
[2, 2, 7, 31],
[3, 3, 3, 4, 4, 4, 5, 5, 5, 29], # 3 times 3, 4 and 5
[2, 2, 2, 3, 137], # and 3 times 2
[2, 2, 7, 31],
[11, 53]]

from collections import Counter, defaultdict

def maxcount(list_of_lists):
out = defaultdict(int)
max_repetitions = 0
for sublist in list_of_lists:
for value, count in Counter(sublist).items():
if count > 1 and count > out[value]:
out[value] = count
if count > max_repetitions:
max_repetitions = count


return([[val, count] for val, count in out.items() if count == max_repetitions])

print(maxcount(list_of_lists))
# [[2, 3], [3, 3], [4, 3], [5, 3]]



I like itertools, so I was curious to compare @Sunitha's solution with this one.


itertools



This solution:


*%timeit maxcount(list_of_lists)
# 65 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)



@Sunitha's solution with more use of itertools:


from itertools import chain, groupby
from collections import Counter

def maxcount_with_itertools(ll):
f = lambda t: t[1]
return list(next(groupby(sorted(chain(*(Counter(l).items() for l in ll)), key=f, reverse=True), f))[1])

%timeit maxcount_with_itertools(list_of_lists)
# 70.9 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)



which is just a little bit slower.



If you're interested in doing it using pure Python, then there is the following way:


list_of_lists = [[11, 53],[2, 137],[2, 7, 31],[2, 2, 7, 31],[3, 3, 3, 29],[2, 2, 2, 3, 137],[2, 2, 7, 31],[11, 53]]

maxOccurences = [max([[elem,sublist.count(elem),index] for elem in sublist], key=lambda i:sublist.count(i)) for index, sublist in enumerate(list_of_lists)]
maximum = max(maxOccurences, key=lambda i: i[1])
elements = [elem[:2] for elem in maxOccurences if elem[1]==maximum[1]]
print(elements)



Output:


[[3, 3], [2, 3]]



Another suggestion would be the following:


list_of_lists = [[11, 53],[2, 137],[2, 7, 31],[2, 2, 7, 31],[3, 3, 3, 29],[2, 2, 2, 3, 137],[2, 2, 7, 31],[11, 53]]

maximum = max([max([[elem,sublist.count(elem)] for elem in sublist], key=lambda i:sublist.count(i)) for sublist in list_of_lists], key=lambda i: i[1])
elements = [[elem,sublist.count(elem)] for sublist in list_of_lists for elem in set(sublist) if sublist.count(elem)==maximum[1]]
print(elements)



Output:


[[3, 3], [2, 3]]



You can use collections.Counter, split into 3 steps:


collections.Counter


Counter


map


max


Counter



Here's a demo.


from collections import Counter

counters = list(map(Counter, list_of_lists))
most_common_count = max(i.most_common(1)[0][1] for i in counters)

res = [(k, v) for i in counters for k, v in i.items() if v == most_common_count]

print(res)

[(3, 3), (2, 3)]






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard