Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 5 of 5

Thread: Quick Question about Mergesort

  1. #1
    Junior Member
    Join Date
    Feb 2010
    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default Quick Question about Mergesort

    Hello,

    I'v been reading and experimenting with recursion for the past few weeks and I was wondering about the Mergesort algorithm performance ( yes I know it's nlog(n)). My question is can the Mergesort method be run on two sub arrays of a main array at the same time with multi threading to increase performance for large data sets (say 1 million doubles)?

    Meaning, say you have a int[100] you can split it to two as in int [1 to 50] int [51 to 100]. Can you run a thread for sorting the first array and another thread for sorting the second array and then once each sub array is sorted combine them.

    Here's my recursive code:
    // The "MergeSort" class.
    // This class implements a recursive merge sort.
    import java.util.*;
    <sniped for clarity>
       public static void merge (int[] a, int first, int middle, int last)
          {
          // List going from first to middle is sorted.
          // List going from middle + 1 to last is sorted.
             int temp[] = new int[last + 1];
          // Index of the front of the first sorted list.
             int firstListPos = first;
          // Index of the front of the second sorted list.
             int secondListPos = middle + 1;
          // Index of the front of the result array.
             int resultListPos = first;
     
             while (resultListPos <= last)
             {
                if ((firstListPos < middle + 1) && ((secondListPos > last) || (a[firstListPos] < a[secondListPos])))
                {
                   temp [resultListPos] = a[firstListPos];
                   firstListPos++;
                }
                else
                {
                   temp[resultListPos] = a[secondListPos];
                   secondListPos++;
                }
                resultListPos++;
             } // while
     
          // Copy merged array back to original place.
             for (int i = first ; i <= last ; i++)
             {
                a[i] = temp[i];
             } // for
          } // merge method
     
     
        // Recursive method that uses itself and the merge method
        // to sort.
           public static void mergeSort (int[] a, int first, int last)
          {
          // If there is only one element (first == last),
          // then the list is already sorted.
             if (last > first)
             {
                int middle = (first + last) / 2;
             // Sort the first half.
                mergeSort (a, first, middle);
             // Sort the second half.
                mergeSort (a, middle + 1, last);
             // Merge the two halves.
                merge (a, first, middle, last);
             } // if
          } // mergeSort method
       } // MergeSort class

    Side note: Can some one tell me when and where it would make sense to run multi threading? I am interested in learning this as I have a i7 920 @4Ghz and VERY FEW apps are multithreaded to take advantage of this power.
    Last edited by Shadow703793; March 3rd, 2010 at 08:47 PM.


  2. #2
    Member
    Join Date
    Feb 2010
    Location
    Auburn, AL
    Posts
    31
    Thanks
    0
    Thanked 8 Times in 8 Posts

    Default Re: Quick Question about Mergesort

    That's a good question.

    Let's see... On a single processor, you would implement it like this...

    MergeSort(array[i...j])
    1. if i = j then return
    2. else
    3. m := (i + j) / 2
    4. MergeSort(array[i, m])
    5. MergeSort(array[m+1, j])
    6. Merge(array[i...j])

    Let n = j - i + 1. Then The recurrence relations for the running time are

    T(1) = 1.
    T(n) = 2T(n/2) + O(n).

    Using the Master theorem with a = 2, b = 2, and f(n) = O(n), it follows that T(n) = O(n log n). So far a so good.

    Now, what you are proposing is essentially a method by which - optimally - we could change the recurrence relations to the following:

    T(1) = 1.
    T(n) = T(n/2) + O(n).

    Notice that the 2 disappears since we can hopefully perform both merge sorts on halves in equal time. Now we apply the master theorem with a = 1, b = 2, and f(n) = O(n). The result now is (unless I am making some silly mistake... somebody please check me on this) that T(n) = O(n).

    The conclusion, God willing, is that parallelization of MergeSort can give about a logarithmic speedup in the best case. So when sorting 256 items, it would take about 1/8 the time in the (fully) parallelized version; when sorting 2^n items, it would take 1/n the time.

    NOW, that being said, there are penalties for starting new tasks that will probably outweigh the benefits you get from splitting the sorting of, say, arrays of size less than N. What that N is depends on a variety of factors specific to the system you are using. Generally speaking, you should expect a speedup if you do it properly.

    Hope this helped. At least we've put a theoretical upper bound on the improvement and noted that in practice you will likely need to restrict the parallelization to increase performance.
    Let me know what you think of my website:
    http://www.auburn.edu/~carpept
    Comments or suggestions are appreciated!

  3. The Following User Says Thank You to CT&SC For This Useful Post:

    Shadow703793 (March 4th, 2010)

  4. #3
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Quick Question about Mergesort

    Even once you have implemented a parallel merge sort, you have to keep in mind the number of cores/processors you're running with. The OS will probably give your program more processor time if you increase the thread count (assuming there are other programs that are significantly taxing processor time), but each thread will get a lot less processor time as the thread count increases, so the maximum benefit is dependent on the amount of processor cores you have.

    The average computer these days have 2 cores (upper end computers have 4-8 or even more cores), so you will be able to sort 1/2 of the array in core 1, taking O(m log m) (where m = n/2), and 1/2 the array in core 2 with O(m log m) concurrently, the recombine the two arrays in O(n) time.

    Using Big-O notation, that still comes out to O(nlogn) time, but it will give you a speed up of ~1 - (nlog(n/2)/2 + n) / (nlog(n) + n), which for lists of ~1000-20000 is a speedup of ~40-45%.

    This is an over-simplified calculation, the actual benefit is likely to be less than this.

    Based on the way these benefits grow with n, and the fact that merge sort does take O(n) memory, you are probably better off switching between different sorting algorithms depending on the size of the data you are sorting (ex. see Intro Sort, which is a combination of Heapsort and Quicksort).

  5. The Following User Says Thank You to helloworld922 For This Useful Post:

    Shadow703793 (March 4th, 2010)

  6. #4
    Member
    Join Date
    Feb 2010
    Location
    Auburn, AL
    Posts
    31
    Thanks
    0
    Thanked 8 Times in 8 Posts

    Default Re: Quick Question about Mergesort

    ^ True, and a better treatment of the practicalities than in my post. I think that the two together pretty much exhaustively answer the OP's question... OP, what's your take on these comments?

    You can combine different sorting algorithms in lots of ways to get algorithms which perform better than the individual sorting algorithms taken together... such as taking an O(n log n) algorithm and an O(n^2) algorithm (helloworld922 mentions heapsort and quicksort, for instance, but merge sort and bubble sort can work as well) and using the O(n log n) to break the problem up and the O(n^2) to finish it off. The reason this works sometimes is that f(n) = O(n log n) and g(n) = O(n^2) don't necessarily allow one to conclude f(n) < g(n) for all n, particularly small n. I guess an example of this would be:

    f(n) = 1000n
    g(n) = n^2

    It's pretty easy to see that g(n) = O(n^2) and f(n) = O(n), even though f(n) > g(n) for all n < 1000. Something to keep in mind...
    Let me know what you think of my website:
    http://www.auburn.edu/~carpept
    Comments or suggestions are appreciated!

  7. #5
    Junior Member
    Join Date
    Feb 2010
    Posts
    10
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default Re: Quick Question about Mergesort

    ^ True, and a better treatment of the practicalities than in my post. I think that the two together pretty much exhaustively answer the OP's question... OP, what's your take on these comments?
    They greatly help.


    Based on the way these benefits grow with n, and the fact that merge sort does take O(n) memory, you are probably better off switching between different sorting algorithms depending on the size of the data you are sorting (ex. see Intro Sort, which is a combination of Heapsort and Quicksort).
    Indeed. I have been experimenting with QuickSort and Insertion sort, how ever I have not taken a look in to Heapsort. I will look in to that.

    The main reason I am looking in to multithreading Mergesort is that I do CFD calculations for a waterblocks and the arrays that need to be sorted are HUGE, in the order of at least, 5000,000 doubles and multithreading this and using Mergesort to sort it would help out a lot. Yes, I could use something like Excel but writing this program would give me a much better understanding and performance. I also plan to expand this program to a multi platform (hence why Java was a very logical option) in to a quite robust calculator for CFD/thermodynamic calculations once I learn more in Collage *Currently in Senior year at Highschool).

Similar Threads

  1. MergeSort
    By mgutierrez19 in forum Algorithms & Recursion
    Replies: 3
    Last Post: March 3rd, 2010, 08:46 PM
  2. Need some serious quick help. :/
    By aanders5 in forum What's Wrong With My Code?
    Replies: 0
    Last Post: February 23rd, 2010, 06:05 PM
  3. Just Need Some quick help with code
    By mulkman in forum What's Wrong With My Code?
    Replies: 1
    Last Post: February 14th, 2010, 11:11 AM
  4. Quick JComboBox Event Question
    By -tr in forum AWT / Java Swing
    Replies: 2
    Last Post: December 12th, 2009, 05:35 PM
  5. Quick binary tree question.
    By Sinensis in forum Java Theory & Questions
    Replies: 1
    Last Post: November 15th, 2009, 09:28 PM

Tags for this Thread