How to know size and number of buckets in bucket sort
Hi, how do I know what to set the range of values and how many buckets to use with bucket sort?
E.g. from Wikipedia: [29,25,3,49,9,37,21,43]
They used 5 buckets of size 10
I know it's supposed to be a distribution sort (not comparison sort), but unsure how they determined the number of buckets and size of each bucket...Any help appreciated.
Re: How to know size and number of buckets in bucket sort
The actual number of buckets or what range each bucket covers is arbitrary, provided that at least all possible values are covered and that no value can belong to more than 1 bucket.
For best possible performance the number of elements in each bucket should be as close as possible to each other as possible. You could try finding the min/max of the dataset, then decide how many buckets you want and computing the range of each bucket or vise-versa.
Re: How to know size and number of buckets in bucket sort
I'm still confused, would appreciate further clarification if not too much trouble. Let's use the OP example. So this is from WIkipedia, and is it only one bucket array, so we don't have multiple arrays of size 10 (what Wikipedia example decided to use for each bucket range (size) ). And what about duplicates, because I keep seeing the use of an array where each servers as a reference to a head node that points to a linked list and we just insert nodes with values in a range to be at front of this linked list so insert in O(1). Any help appreciated.
Re: How to know size and number of buckets in bucket sort
How each bucket is implemented isn't terribly important, they just need to be able to hold stuff and sort them.
There are multiple buckets, there are 5 buckets each with range(10).
So all values [0,10) goes into bucket 1, all vales [10, 20) goes int bucket 2, ... and so on. There's no overlap and every possible value is covered.
After scattering all the values, each bucket can be sorted the gathered back together.