Short modules for introducing parallel concepts
Presented by David Bunde.
Tutorial at
CCSC-MW 2013.
- My slides
(6 to a page version).
- Module 1: Mandelbrot set with OpenMP
- Module 2: Short exercises with CUDA
- Module 3: Chapel in Algorithms
- Other mentioned resources
Abstract:
One of the current big challenges in CS education is how to
incorporate parallel programming to prepare our graduates for a world
in which essentially all computers have multiple cores.
The recent model Computer Science Curricula 2013 (CS2013) is spurring
many into action, but there are still many questions about what to
teach and how to add parallelism to already-full courses.
This tutorial presents three modules the presenter and others have
used in actual classes as brief introductions to parallel concepts as
part of an ``early and often'' approach to incorporating parallelism
throughout the curriculum.
Each module is built around a different parallel language that
highlights a different aspect of parallel programming.
Despite this, each requires only a couple of days of class time and
fits within a standard course.
All of them include associated assignments and/or laboratory exercises
for students.
The modules:
-
The first module uses OpenMP to quickly prototype different approaches
to parallelizing a simple fractal generation program.
It demonstrates the ideas of speedup, race conditions, load balancing,
and parallel overhead.
The presenter has used it in an operating systems course, connected
to the presentation of threads and concurrency normally included in
that course.
-
The second module uses CUDA to demonstrate SIMD (Single Instruction,
Multiple Data) computing and heterogeneous parallelism.
CUDA is NVIDIA's architecture and programming language for
general-purpose computing on graphics processing units (GP-GPU
programming).
CUDA is an appealing option for parallel computing because
graphics processors have large numbers of cores (several hundred on
mid-range cards) and the language is C plus a couple of added
constructs.
The presenter has used this module in the context of a computer
organization course to demonstrate the role of architectural features in
program performance.
-
The final module uses Chapel, a parallel programming language designed
for high-performance computing (HPC).
One of its original design goals was ease of use, which we exploit to
quickly introduce the language in the context of an algorithms course.
The module only covers a subset of the language, but includes the
keywords to launch parallel tasks to demonstrate parallelism in
divide and conquer algorithms.
Also included is the idea of reductions, a fundamental parallel
algorithmic technique.
Standard reductions (like summing values) are built into Chapel, but
the programmer can also define customized reductions using
a framework that fits into the discussion of dynamic programming.