-------- THREAD POOL ASSIGNMENT PART II: -------- Consider the conventional quicksort algorithm (I'm a boring computer science professor, so what else do I know? :( ). I've attached an implementation of quicksort called quickdouble.java. Briefly, quicksort works as follows: Given an array such as 2 7 4 6 3 1 8 1. Find a "pivot element" to partition the list into two halves. In my implemenation the findpivot function returns the index of the first element out of order (the index of 4 in the above example). If no such pivot exists, then the array segement is already sorted. 2. Partition the array (via in-place swapping) into two halfs, the left one <= pivot, and the right one > pivot. My parition function does this, and also returns the starting index of the second partition. With the above example, the two partitions will be 2 4 3 1 and 7 6 8 Both my findpivot and parition functions are parameterized by the start and end indices of the array segment they're responsible for. 3. "Recursively" quicksort both partitions. But there is a problem if we implement (in a Java-type language) quicksort using direct recursion (have quicksort call itself), because quicksort is only average-case O(n*log n); in the worst case (if given an array in reverse-sorted order to begin with), it's O(n*n). Also in the worst case, the depth of recursion becomes O(n) instead of O(log n), which means for large enough arrays, we would overflow the runtime stack if we used direct recursion (usually when the array contains more than about 8000 elements). Thus, in my implmentation of quicksort, I used a separate stack data structure. This stack replaces the runtime stack, and instead of making recursive calls, I simply push a new set of start,end indices onto this stack. It so happens that this way of implementing quicksort also suggests how it can be parallelized. Think of the stack as a sort of job queue for your thread pool. The findpivot and partition functions cannot be parallelized, but once we divide a partition into two smaller partitions, then each can be sorted independently by a different thread. Importantly, the partitions will never overlap. It's also not important that we use a lifo-style stack so long as all partitions get sorted soon or later. Thus conceptually, instead of pushing a new partition onto the stack, we can just submit a new job, which includes the start and end indices of the partition it should sort, to a thread pool. Of course, the cost of creating the job objects and the thread-pool management routines can be a significant overhead. It's not a good idea to submit a new job for a partition that contains only 2 or 3 elements. So keep the conventional implementation that I gave you. If the partition is too small, then sort it in place using the conventional algorithm, otherwise, submit a new job. In my implementation, I submitted new jobs only when the partition size is greater than 50000. Using my rather old 2.4ghz 4-core Intel processor, I was able to observe a roughly 2X speedup when sorting an array of 50 million randomly generated doubles. First compile and run the quickdouble program as follows: java -Xmx1024m quickdouble 50000000 The -Xmx1024m option increases the java heap space to 1 gigabyte. If you don't have enough memory, use smaller arrays or add delays to the quicksort routines to intentionally slow it down. On my quad core this took about 16 seconds. My parallelized version took about 8 seconds, but this probably could be better with some fine-tunning. Uncomment the .print() line to print the sorted array (not recommended for very large arrays though!) Implement the parallelized quicksort. First make sure it works on small arrays (print the sorted arrays). Find a multi-core processor and run some experiments. Report your findings. ------- Additional notes and hints on the parallel quicksort problem: The best experimental result I was able to obtain was a 2.7X speedup while sorting 120 million random doubles. I created new threadpool jobs when the partition size is greater than 1000, and used the conventional algorithm otherwise (the 50000 threshold I mentioned earlier turned out to be too large). The theoretical maximum speedup on my system is 4x since I have 4 cores, but this is unlikely. How do you know if your program has really sorted such a large array? Call the following function to check it: public static boolean checksorted(double[] A) // check if array is sorted { boolean ok = true; for(int i=0;ok && iA[i+1]) ok = false; return ok; } Don't factor this into the timing measurements, however. Making the array double[] A a static member of your class may also help a little. ***** Observe the principles of abstraction as much as possible. You can't just tailor the threadpool to work with this example. I.e., if your threadpool class makes a reference to elements of quicksort code, you will not receive credit even if it "works". The threadpool should stay generic - something that can be used by any program. ***** When running the timing experiments, make sure that your system is not running a lot of things in the background. Close browsers and antiviruses and take it off the network. On windows, run the task manager and make sure that the "System Idle Process" shows close to 100%. You can also use the task manager to montior CPU activity. I would be interested to know who can achieve the best results.