• March 3rd, 2010, 07:41 PM
Hello,

I'v been reading and experimenting with recursion for the past few weeks and I was wondering about the Mergesort algorithm performance ( yes I know it's nlog(n)). My question is can the Mergesort method be run on two sub arrays of a main array at the same time with multi threading to increase performance for large data sets (say 1 million doubles)?

Meaning, say you have a int[100] you can split it to two as in int [1 to 50] int [51 to 100]. Can you run a thread for sorting the first array and another thread for sorting the second array and then once each sub array is sorted combine them.

Here's my recursive code:
Code :

```// The "MergeSort" class. // This class implements a recursive merge sort. import java.util.*; <sniped for clarity> public static void merge (int[] a, int first, int middle, int last) { // List going from first to middle is sorted. // List going from middle + 1 to last is sorted. int temp[] = new int[last + 1]; // Index of the front of the first sorted list. int firstListPos = first; // Index of the front of the second sorted list. int secondListPos = middle + 1; // Index of the front of the result array. int resultListPos = first;   while (resultListPos <= last) { if ((firstListPos < middle + 1) && ((secondListPos > last) || (a[firstListPos] < a[secondListPos]))) { temp [resultListPos] = a[firstListPos]; firstListPos++; } else { temp[resultListPos] = a[secondListPos]; secondListPos++; } resultListPos++; } // while   // Copy merged array back to original place. for (int i = first ; i <= last ; i++) { a[i] = temp[i]; } // for } // merge method     // Recursive method that uses itself and the merge method // to sort. public static void mergeSort (int[] a, int first, int last) { // If there is only one element (first == last), // then the list is already sorted. if (last > first) { int middle = (first + last) / 2; // Sort the first half. mergeSort (a, first, middle); // Sort the second half. mergeSort (a, middle + 1, last); // Merge the two halves. merge (a, first, middle, last); } // if } // mergeSort method } // MergeSort class```

Side note: Can some one tell me when and where it would make sense to run multi threading? I am interested in learning this as I have a i7 920 @4Ghz and VERY FEW apps are multithreaded to take advantage of this power.
• March 3rd, 2010, 08:54 PM
CT&SC
That's a good question.

Let's see... On a single processor, you would implement it like this...

MergeSort(array[i...j])
1. if i = j then return
2. else
3. m := (i + j) / 2
4. MergeSort(array[i, m])
5. MergeSort(array[m+1, j])
6. Merge(array[i...j])

Let n = j - i + 1. Then The recurrence relations for the running time are

T(1) = 1.
T(n) = 2T(n/2) + O(n).

Using the Master theorem with a = 2, b = 2, and f(n) = O(n), it follows that T(n) = O(n log n). So far a so good.

Now, what you are proposing is essentially a method by which - optimally - we could change the recurrence relations to the following:

T(1) = 1.
T(n) = T(n/2) + O(n).

Notice that the 2 disappears since we can hopefully perform both merge sorts on halves in equal time. Now we apply the master theorem with a = 1, b = 2, and f(n) = O(n). The result now is (unless I am making some silly mistake... somebody please check me on this) that T(n) = O(n).

The conclusion, God willing, is that parallelization of MergeSort can give about a logarithmic speedup in the best case. So when sorting 256 items, it would take about 1/8 the time in the (fully) parallelized version; when sorting 2^n items, it would take 1/n the time.

NOW, that being said, there are penalties for starting new tasks that will probably outweigh the benefits you get from splitting the sorting of, say, arrays of size less than N. What that N is depends on a variety of factors specific to the system you are using. Generally speaking, you should expect a speedup if you do it properly.

Hope this helped. At least we've put a theoretical upper bound on the improvement and noted that in practice you will likely need to restrict the parallelization to increase performance.
• March 3rd, 2010, 10:40 PM
helloworld922
Even once you have implemented a parallel merge sort, you have to keep in mind the number of cores/processors you're running with. The OS will probably give your program more processor time if you increase the thread count (assuming there are other programs that are significantly taxing processor time), but each thread will get a lot less processor time as the thread count increases, so the maximum benefit is dependent on the amount of processor cores you have.

The average computer these days have 2 cores (upper end computers have 4-8 or even more cores), so you will be able to sort 1/2 of the array in core 1, taking O(m log m) (where m = n/2), and 1/2 the array in core 2 with O(m log m) concurrently, the recombine the two arrays in O(n) time.

Using Big-O notation, that still comes out to O(nlogn) time, but it will give you a speed up of ~1 - (nlog(n/2)/2 + n) / (nlog(n) + n), which for lists of ~1000-20000 is a speedup of ~40-45%.

This is an over-simplified calculation, the actual benefit is likely to be less than this.

Based on the way these benefits grow with n, and the fact that merge sort does take O(n) memory, you are probably better off switching between different sorting algorithms depending on the size of the data you are sorting (ex. see Intro Sort, which is a combination of Heapsort and Quicksort).
• March 3rd, 2010, 11:32 PM
CT&SC
^ True, and a better treatment of the practicalities than in my post. I think that the two together pretty much exhaustively answer the OP's question... OP, what's your take on these comments?

You can combine different sorting algorithms in lots of ways to get algorithms which perform better than the individual sorting algorithms taken together... such as taking an O(n log n) algorithm and an O(n^2) algorithm (helloworld922 mentions heapsort and quicksort, for instance, but merge sort and bubble sort can work as well) and using the O(n log n) to break the problem up and the O(n^2) to finish it off. The reason this works sometimes is that f(n) = O(n log n) and g(n) = O(n^2) don't necessarily allow one to conclude f(n) < g(n) for all n, particularly small n. I guess an example of this would be:

f(n) = 1000n
g(n) = n^2

It's pretty easy to see that g(n) = O(n^2) and f(n) = O(n), even though f(n) > g(n) for all n < 1000. Something to keep in mind...
• March 4th, 2010, 04:48 PM