CSCI 3366 (Parallel and Distributed Processing), Fall 2017:
Homework 4 Solution (Programming Problems)

Code

For this assignment I first wrote a parallel program using a simple approach to parallelization based on our Divide and Conquer pattern. But knowing that for this problem that might not give good load balance, I also wrote another program employing a different approach (first setting up a pool of threads and then, on every split, creating tasks for one or both subproblems, depending on whether their size exceeds some threshold, and turning them over to the thread pool for execution). This is more complicated than the approach I intended you to do, but I was curious about how much it would improve performance.

Results (data and plots)

(There are a lot of files here, so rather than make individual links I've made links to directories.)

Sequential times (data).
These files contain results of timing the sequential program on various machines, using different combinations of number of elements and seed. I didn't ask you to do this, but I was curious about how much difference choice of seed made, knowing that quicksort only gives $O(N \log N)$ performance in the average case. But execution times didn't vary that much. Before doing timings I recompiled for the particular machine, in case that made a difference.
Each line reports, for a combination of number of elements and seed, average time over 5 trials.
Filenames are meant to be descriptive.
Parallel times for first program (data).
These files contain results of timing the first (simple) program on various machines, varying parameters as for the sequential program. I collected a lot of data, more than I asked you to do, because I was curious about performance on different platforms. Before doing timings I recompiled for the particular machine, in case that made a difference.
Each line reports, for a combination of number of elements, seed, and number of UEs, average time over 5 trials. In most cases, all 5 trials took roughly the same amount of time, but on Dione, for a few combinations of parameters, times varied quite a bit. I lack the time to investigate, but -- interesting?
Filenames are meant to be descriptive.
Parallel times for second program (data).
These files contain results of timing the second (thread-pool-based) program on various machines, varying parameters as for the sequential program and also varying the threshold (number of elements at which new tasks are created). I collected a lot of data because I was curious about performance on different platforms. Before doing timings I recompiled for the particular machine, in case that made a difference.
Each line reports, for a combination of number of elements, seed, and number of UEs and threshold, average time over 5 trials. In most cases, all 5 trials took roughly the same amount of time, but on Dione, for a few combinations of parameters, times varied quite a bit. I lack the time to investigate, but -- interesting?
Filenames are meant to be descriptive.
Sequential times (plots).
This file contains a plot of execution times for the sequential program on various machines, using different combinations of number of elements and seed. I didn't ask you to do this, but I was curious.
Title and legend should identify what's what.
Parallel times for first program (plots).
These files contain plots of execution times for the first parallel program on various machines, plotting different combinations of number of elements and seed.
This is a lot of plots, more than I asked you to do, but I was curious about how results compared. As with Homeworks 2 and 3, each plot shows parallel times plus sequential execution time and ``perfect speedup'' (sequential time / number of UEs) for comparison purposes. Results were mostly about what I thought they would be -- nowhere near ``perfect'' speedup, but definitely some improvement with multiple threads, up to the point where number of threads is more than the platform can really do in parallel. I was a little surprised that on Dione times didn't improve for 64 threads, but I lack the time to investigate. I was more surprised by how much worse times were for 64 or more threads on Dione but not on the other two machines I tried (a DIAS and a Pandora), but the Dione hardware is markedly different (Opteron processors if I remember right), and it's a different release of Java. I also found it interesting that improvements did vary noticeably depending on seed, which is not a total surprise but interesting.
Filenames are meant to be descriptive, and titles and legends should identify what's what.
Parallel times for first and second programs (plots).
These files contain plots of execution times for both parallel programs on various machines, plotting different combinations of number of elements and seed.
This is a lot of plots, but I was curious about how results compared. For each combination of parameters and machine, I plotted results for the first program and then for the second program using two different values for threshold (number of elements for which a new task is created). As with Homeworks 2 and 3, each plot shows parallel times for the parallel programs plus sequential execution time and ``perfect speedup'' (sequential time / number of UEs) for comparison purposes. Results were mostly about what I thought they would be -- in most cases, at least until the point where number of threads is more than the platform can really do in parallel, the thread-pool version was faster, and times decreased with more threads, though again not close to ``perfect'' speedup. Again some of the results on Dione were somewhat surprising, but I lack the time to investigate.
Filenames are meant to be descriptive, and titles and legends should identify what's what.

Berna Massingill
2017-12-11

CSCI 3366 (Parallel and Distributed Processing), Fall 2017: Homework 4 Solution (Programming Problems)

Code

Results (data and plots)

CSCI 3366 (Parallel and Distributed Processing), Fall 2017:
Homework 4 Solution (Programming Problems)