How to sort 10million number in python without using sort function? [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to sort 10million number in python without using sort function? [closed]



Sort the list of random generated 10 Million numbers between 1 and 100, in python without using inbuilt functions, Quicksort didnt worked for me here.



I have used quicksort code from the mentioned link:
http://interactivepython.org/runestone/static/pythonds/SortSearch/TheQuickSort.html



Error I got while implementing it :
for x in range (0, 100000):
listOfNumbers.append(random.randint(1, 100))



quickSort(listOfNumbers)
print(listOfNumbers)



RuntimeError: maximum recursion depth exceeded



This question appears to be off-topic. The users who voted to close gave this specific reason:





Are you sure you used quicksort correctly? Maybe start with bubble sort if you couldn't get quicksort. interactivepython.org/runestone/static/pythonds/SortSearch/…
– Mike
Aug 6 at 16:31






What do you mean quicksort didn't work for you? Your implementation didn't work?
– Easton Bornemeier
Aug 6 at 16:31





You can write something a lot simpler—and more efficient—than quicksort for this problem. As a hint: if you had a list of 100 counts, could you turn that into a list of 10 million numbers in sorted order?
– abarnert
Aug 6 at 16:35





Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. Minimal, complete, verifiable example applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described.
– Prune
Aug 6 at 16:38





@Easton, I tried quicksort. It worked fine for 1000 numbers. But with 10000 numbers, it showed error : Tree limit exceeded.
– Ashish
Aug 6 at 16:58




3 Answers
3



You can use any sort algorithm you want, as long as you implement it right. But the problem is calling out for a radix sort. In particular, the dumbest kind of radix sort, a bucket-counter.



You have N=10,000,000 total values and a range of M=100 distinct values. A bucket counter will take O(N+M) time, which is better than O(N*log N), and O(M) space,1 which is negligible—and. best of all, it's dead simple:


N=10,000,000


M=100


O(N+M)


O(N*log N)


O(M)


def bucketsorted100(xs):
buckets = [0] * 101
for x in xs:
buckets[x] += 1
for x, count in enumerate(buckets):
yield from [x] * count



You can obviously extend this to not be hardcoded for numbers from 1-100 (actually, I hardcoded it for numbers from 0-100, wasting 1% space, but who cares?). Or you can add support for a key function. Or you can make it even simpler by using a Counter instead of a list.


key


Counter


list



1. Technically, it's O(logN * M) space, because the counts range up to N, which takes logN bits, which the values only range up to 100, which takes a constant number of bits. But practically, all of the counts fit into a single 30-bit "digit" in CPython, so the logN factor never comes up.


O(logN * M)


N


logN


logN





Thanks Abanert for your answer. Just adding a bit modification of it. import random listOfNumbers= def bucketsorted100(xs): buckets = [0] * 101 for x in xs: buckets[x] += 1 for x, count in enumerate(buckets): if(count>0): yield [x] * count for x in range (0, 1000000): listOfNumbers.append(random.randint(1, 100)) #Calling function: tuple(bucketsorted100(listOfNumbers)) It worked for me(in less than 5 seconds)
– Ashish
Aug 6 at 18:59





@Ashish With your modification, this no longer produces an iterable of the sorted values, it now produces an iterable of lists of values, which you have to flatten. But if that's OK for your use, that's fine.
– abarnert
Aug 6 at 19:05



You can use the mighty Bogosort.


import random

def is_sorted(data):
for i in range(len(data) - 1):
if data[i] > data[i + 1]:
return False
return True

def bogosort(data):
while not is_sorted(data):
random.shuffle(data)
return data





But this uses the “inbuilt function” random.shuffle. Otherwise, of course, it would be perfect. :)
– abarnert
Aug 6 at 16:36


random.shuffle





@abarnert But random.shuffle is not listed as a built in function so I think we're good -- Unless you consider import statement to be a built in function...
– Gabriel
Aug 6 at 16:39


random.shuffle


import





@ gabriel Random.shuffle will not be considered as inbuilt function 👍🏼👍🏼👍🏼
– Ashish
Aug 6 at 16:56





Well, the import statement isn't a function—it does call a bunch of functions from importlib, but those are implemented in Python, just like shuffle is, so I don't think it makes much difference.
– abarnert
Aug 6 at 17:51


import


importlib


shuffle





@Ashish It works best if you're lucky.
– Gabriel
Aug 6 at 18:35



Maybe numpy will be faster... you can convert numbers to a numpy array then use numpy.sort


numpy


numpy array



Like this:


import numpy as np
mylist=[15,65,3,1,10,35,11,65,23,95,20,36,85,12,37,85,46,93] # ...
sorted_mylist=np.ndarray.tolist(np.sort(np.asarray(mylist)))
print sorted_mylist



which give you:


[1, 3, 10, 11, 12, 15, 20, 23, 35, 36, 37, 46, 65, 65, 85, 85, 93, 95]





Sometimes I feel numpy is the jQuery of Python.
– Gabriel
Aug 6 at 16:42





Since Numpy is implemented in C, it makes sense that we'd be seeing a huge improvement over our pure Python Sort
– A STEFANI
Aug 6 at 16:44





NumPy's hybrid introsort/heapsort implemented in C still isn't going to be anywhere near as fast as a radix sort implemented in C for this problem…
– abarnert
Aug 6 at 17:50





@Gabriel But can numpy generate freehand circles? Not very easy… unless you add in scipy.ndimage. :)
– abarnert
Aug 6 at 18:13


scipy.ndimage

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard