CSCI 3366 (Parallel and Distributed Processing), Fall 2017:
Homework 2 Solution (Programming Problems)

Code

(There are kind of a lot of files here, so rather than make individual links I've made links to directories. Most programs need the timer.h file from the starter code to compile.)

Custom RNG implementation and test program, C version.
Custom RNG implementation and test program, Java version.
Sequential C version. Note that there are several files, a .h file with most of the code and two .c files that allow generating both double-precision and single-precision versions. (I wanted a single-precision version for comparison purposes with OpenCL.)
Sequential Java version.
Parallel OpenMP version.
Parallel MPI version.
Parallel Java version.
Parallel OpenCL version. Note that I've included some .h files from the ``sample programs'' page and also put the OpenCL kernel in a separate file, to be read at runtime.

Results (data and plots)

(There are a lot of files here, so rather than make individual links I've made links to directories.)

Sequential accuracy (data).
These files contain results of measuring accuracy for different seeds and versions.
Each line reports, for a seed and number of samples, difference between computed pi and best available value (M_PI).
I collected data for two versions of the C sequential code, one using double-precision arithmetic and one using single-precision, since I thought that would make comparisons between sequential and OpenCL versions more meaningful. I also collected data for the starter-code version using library functions/classes (srand()/rand() for C, Random for Java) -- I didn't ask you to do this but thought it might be interesting. I did collect data for the Java version using the custom RNG, but results were essentially identical to those of the C version (as they should have been!) so I didn't plot those.
File names are meant to identify version (``dp'' for C using double precision, ``sp'' for C using single precision).
Sequential times (data).
These files contain results of timing the sequential programs on various machines. Before doing timings I recompiled for the particular machine, in case that made a difference.
Each line reports, for a number of samples, average time over 5 trials (but results from trial to trial were very similar).
I collected data for two versions of the C sequential code, one using double-precision arithmetic and one using single-precision, since I thought that would make comparisons between sequential and OpenCL versions more meaningful. (I found it interesting that the times differ quite a bit. ``Hm!''?)
File names are meant to identify version (``dp'' for C using double precision, ``sp'' for C using single precision, ``java'' for Java) and machine(s).
Parallel accuracy (data).
These files contain results of measuring accuracy for different numbers of samples and UEs, using the seed that gave the best average results (4321).
Each line reports, for a number of samples and number of UEs, difference between computed pi and best available value (M_PI).
I collected data for all four versions of the code, but results for the OpenMP, MPI, and Java versions were identical (as they should have been!). Results for the OpenCL version were somewhat different since they use single precision.
File names are meant to identify version (``dp'' for double precision, ``sp'' for single precision).
Parallel times (data).
These files contain results of timing all the parallel programs on various machines. Before doing timings I recompiled for the particular machine, in case that made a difference.
Each line reports, for a number of samples and number of UEs, average time over 5 trials (but results from trial to trial were very similar).
I collected a lot of data, more than I asked you to do, because I was curious about performance on different platforms. My idea was to use Dione as the main platform for the two shared-memory versions (OpenMP and Java), the Pandora cluster as the main platform for the MPI version, and both Deimos and Atlas00 as main platforms for the OpenCL version, but I ran other experiments as well to allow for what I thought might be interesting plots. In particular I tried the OpenCL version both using the GPU and using the CPU, and for both of those I tried both ``preferred size'' as the workgroup size and maximum size. I also collected data for two ``problem sizes'' (number of samples).
Filenames are meant to be descriptive.
Sequential accuracy (plots).
These files contain plots of accuracy for different seeds and versions. This is more than I asked you to do, but I thought would be interesting; I also made plots for the starter versions using library functions/classes.
I also plotted a comparison of ``best'' values for the various versions, where for each version I picked the seed that gave the best average over the numbers of samples I tried.
I couldn't decide whether plotting using lines or bars made more sense so I did both.
File names are meant to be descriptive, and titles and keys should tell you what's what.
Parallel accuracy (plots).
These files contain plots of accuracy for varying numbers of samples and UEs.
I couldn't decide whether plotting using lines or bars made more sense so I did both.
File names are meant to be descriptive, and titles and keys should tell you what's what.
Parallel times (plots).
These files contain plots of execution times for all the parallel programs on various machines. I also tried two different ``problem sizes'' (number of samples).
This is a lot of plots, more than I asked you to do, but I was curious about how results compared. I combined some results where I thought it would be interesting to do so -- e.g., both OpenMP and MPI in a single plot, for platforms where I tried both. Each plot shows one or more parallel versions, plus sequential execution time and ``perfect speedup'' (sequential time / number of UEs) for comparison purposes. (Sequential times for the OpenCL version are for the single-precision version.) For the OpenCL versions I plotted times for execution with the GPU and also with the CPU (using OpenCL) and also plotted times for both ``preferred'' and maximum workgroup sizes.
Filenames are meant to be descriptive, and titles and legends should identify what's what.

Berna Massingill
2017-11-28

CSCI 3366 (Parallel and Distributed Processing), Fall 2017: Homework 2 Solution (Programming Problems)

Code

Results (data and plots)

CSCI 3366 (Parallel and Distributed Processing), Fall 2017:
Homework 2 Solution (Programming Problems)