How to sort 10million number in python without using sort function? [closed]
Clash Royale CLAN TAG#URR8PPP
How to sort 10million number in python without using sort function? [closed]
Sort the list of random generated 10 Million numbers between 1 and 100, in python without using inbuilt functions, Quicksort didnt worked for me here.
I have used quicksort code from the mentioned link:
http://interactivepython.org/runestone/static/pythonds/SortSearch/TheQuickSort.html
Error I got while implementing it :
for x in range (0, 100000):
listOfNumbers.append(random.randint(1, 100))
quickSort(listOfNumbers)
print(listOfNumbers)
RuntimeError: maximum recursion depth exceeded
This question appears to be off-topic. The users who voted to close gave this specific reason:
What do you mean quicksort didn't work for you? Your implementation didn't work?
– Easton Bornemeier
Aug 6 at 16:31
You can write something a lot simpler—and more efficient—than quicksort for this problem. As a hint: if you had a list of 100 counts, could you turn that into a list of 10 million numbers in sorted order?
– abarnert
Aug 6 at 16:35
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. Minimal, complete, verifiable example applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described.
– Prune
Aug 6 at 16:38
@Easton, I tried quicksort. It worked fine for 1000 numbers. But with 10000 numbers, it showed error : Tree limit exceeded.
– Ashish
Aug 6 at 16:58
3 Answers
3
You can use any sort algorithm you want, as long as you implement it right. But the problem is calling out for a radix sort. In particular, the dumbest kind of radix sort, a bucket-counter.
You have N=10,000,000
total values and a range of M=100
distinct values. A bucket counter will take O(N+M)
time, which is better than O(N*log N)
, and O(M)
space,1 which is negligible—and. best of all, it's dead simple:
N=10,000,000
M=100
O(N+M)
O(N*log N)
O(M)
def bucketsorted100(xs):
buckets = [0] * 101
for x in xs:
buckets[x] += 1
for x, count in enumerate(buckets):
yield from [x] * count
You can obviously extend this to not be hardcoded for numbers from 1-100 (actually, I hardcoded it for numbers from 0-100, wasting 1% space, but who cares?). Or you can add support for a key
function. Or you can make it even simpler by using a Counter
instead of a list
.
key
Counter
list
1. Technically, it's O(logN * M)
space, because the counts range up to N
, which takes logN
bits, which the values only range up to 100, which takes a constant number of bits. But practically, all of the counts fit into a single 30-bit "digit" in CPython, so the logN
factor never comes up.
O(logN * M)
N
logN
logN
Thanks Abanert for your answer. Just adding a bit modification of it. import random listOfNumbers= def bucketsorted100(xs): buckets = [0] * 101 for x in xs: buckets[x] += 1 for x, count in enumerate(buckets): if(count>0): yield [x] * count for x in range (0, 1000000): listOfNumbers.append(random.randint(1, 100)) #Calling function: tuple(bucketsorted100(listOfNumbers)) It worked for me(in less than 5 seconds)
– Ashish
Aug 6 at 18:59
@Ashish With your modification, this no longer produces an iterable of the sorted values, it now produces an iterable of lists of values, which you have to flatten. But if that's OK for your use, that's fine.
– abarnert
Aug 6 at 19:05
You can use the mighty Bogosort.
import random
def is_sorted(data):
for i in range(len(data) - 1):
if data[i] > data[i + 1]:
return False
return True
def bogosort(data):
while not is_sorted(data):
random.shuffle(data)
return data
But this uses the “inbuilt function”
random.shuffle
. Otherwise, of course, it would be perfect. :)– abarnert
Aug 6 at 16:36
random.shuffle
@abarnert But
random.shuffle
is not listed as a built in function so I think we're good -- Unless you consider import
statement to be a built in function...– Gabriel
Aug 6 at 16:39
random.shuffle
import
@ gabriel Random.shuffle will not be considered as inbuilt function 👍🏼👍🏼👍🏼
– Ashish
Aug 6 at 16:56
Well, the
import
statement isn't a function—it does call a bunch of functions from importlib
, but those are implemented in Python, just like shuffle
is, so I don't think it makes much difference.– abarnert
Aug 6 at 17:51
import
importlib
shuffle
@Ashish It works best if you're lucky.
– Gabriel
Aug 6 at 18:35
Maybe numpy
will be faster... you can convert numbers to a numpy array
then use numpy.sort
numpy
numpy array
Like this:
import numpy as np
mylist=[15,65,3,1,10,35,11,65,23,95,20,36,85,12,37,85,46,93] # ...
sorted_mylist=np.ndarray.tolist(np.sort(np.asarray(mylist)))
print sorted_mylist
which give you:
[1, 3, 10, 11, 12, 15, 20, 23, 35, 36, 37, 46, 65, 65, 85, 85, 93, 95]
Sometimes I feel numpy is the jQuery of Python.
– Gabriel
Aug 6 at 16:42
Since Numpy is implemented in C, it makes sense that we'd be seeing a huge improvement over our pure Python Sort
– A STEFANI
Aug 6 at 16:44
NumPy's hybrid introsort/heapsort implemented in C still isn't going to be anywhere near as fast as a radix sort implemented in C for this problem…
– abarnert
Aug 6 at 17:50
@Gabriel But can numpy generate freehand circles? Not very easy… unless you add in
scipy.ndimage
. :)– abarnert
Aug 6 at 18:13
scipy.ndimage
Are you sure you used quicksort correctly? Maybe start with bubble sort if you couldn't get quicksort. interactivepython.org/runestone/static/pythonds/SortSearch/…
– Mike
Aug 6 at 16:31