This tutorial is designed to quickly get you started programming in Habanero Java (HJ). It is written to be accessible to a broad audience so we do not assume background other than familiarity with Java.
This tutorial is a combination of contributions from
David Bunde, Johnathan Ebbers, Maxwell Galloway-Carson, Michael Graf,
Sung Joo Lee, Andrei Papancea, and Casey Samoore.
It was created as part of the "Teaching parallel computing with higher-level languages and activity-based laboratories" project, with support
from National Science Foundation's
"Transforming Undergraduate Education in Science, Technology, Engineering and
Mathematics (TUES)" program under Award No.
DUE-1044299
and the Andrew W. Mellon Foundation.
It is currently being maintained by David Bunde. Please let us know
(dbunde@knox.edu) what you think of the tutorial so that we can continue to improve it.
Habanero Java (HJ) is a part of the Habanero Multicore Software Research Project at Rice University. HJ is an extension of java designed to teach high level parallel programming to undergraduate and graduate level students. This tutorial is designed to walk someone familiar with Java through the additional features afforded by Habanero.
Before Habanero Java, the standard way to write concurrent programs in Java was using the Thread class. Threads create child tasks that run on different processors than the parent tasks. These threads are given pieces of code to run, and upon completion they are then joined with the parent. Habanero Java takes these concepts and simplifies them by using keywords and functions.
For directions to install the new library-based implementation of HJ, go here. The older version which we describe here used the instructions here, but that version is out of date.
One of the simplest forms of parallelization in HJ is done via a forall loop. A forall loop works similarly to a for loop and must be written in terms of points and regions.
The idea of a forall loop is to run the iterations of the for loop in parallel. As the parent task enters the forall loop it creates multiple threads. These threads then continue to iterate through the loop and join up when they exit. Let's take a look at a quick example, a program that does simple summation of the numbers in an integer array, like this:
Let us now try to parallelize something! Below is the code that will serially solve a matrix multiplication and print the runtime in miliseconds. The variable n is used to keep track of the size of the matrices. We have initialized arrays A, B, and C which are all two-dimensional arrays. So let's try to convert this program into a program that runs in parallel using the forall construct. After you have parallelized it with the forall construct, run it a couple of times and compare the runtimes of the serial and parallel versions. What do you notice? Also, think about why it’s behaving in a certain way and try different values for n.
Note that this is not the best way to parallelize this program. Also note that in order to check that the program is actually doing matrix multiplication reduce the size of n to 10 and uncomment the m1.display(); line in the main method.
(Create a separate file for both serial and parallel programs since you would want to compare their runtimes)
public class MatrixMultiply{ int n = 500; int[][] A = new int[n][n]; int[][] B = new int[n][n]; int[][] C = new int[n][n]; public void display() { System.out.println("Matrix 1: "); printMatrix(A); System.out.println("Matrix 2: "); printMatrix(B); System.out.println("Result of Multiplication of both matrix: "); printMatrix(C); } public void printMatrix(int[][] mat){ for (int i = 0 ; i < n ; i ++){ for (int j = 0 ; j < n ; j ++) System.out.print(" " + mat[i][j]); System.out.println(); } } public void multiply() { for (int i = 0 ; i < n ; i ++) for (int j = 0 ; j < n ; j ++) A[i][j] = i+1; for (int i = 0 ; i < n ; i ++) for (int j = 0 ; j < n ; j ++) B[i][j] = i+1; long timeKeep = System.currentTimeMillis(); for (int i = 0 ; i < n ; i ++) for (int j = 0 ; j < n ; j ++) C[i][j] = 0; for (int i = 0 ; i < n ; i++) for (int j = 0 ; j < n ; j++) for (int k = 0 ; k < n ; k++) C[i][j] += A[i][k] * B[k][j]; System.out.println(System.currentTimeMillis()-timeKeep); } public static void main(String[] args) { MatrixMultiply m1 = new MatrixMultiply(); m1.multiply(); //m1.display(); } }
public class MatrixMultiplyP{ public static int n = 500; public static int[][] A = new int[n][n]; public static int[][] B = new int[n][n]; public static int[][] C = new int[n][n]; public void display() { System.out.println("Matrix 1: "); printMatrix(A); System.out.println("Matrix 2: "); printMatrix(B); System.out.println("Result of Multiplication of both matrix: "); printMatrix(C); } public void printMatrix(int[][] mat){ for (int i = 0 ; i < n ; i ++){ for (int j = 0 ; j < n ; j ++) System.out.print(" " + mat[i][j]); System.out.println(); } } public void multiply() { for (int i = 0 ; i < n ; i ++) for (int j = 0 ; j < n ; j ++) A[i][j] = i+1; for (int i = 0 ; i < n ; i ++) for (int j = 0 ; j < n ; j ++) B[i][j] = i+1; long timeKeep = System.currentTimeMillis(); for (int i = 0 ; i < n ; i ++) for (int j = 0 ; j < n ; j ++) C[i][j] = 0; forall (point[i] : [0:n-1]) for (int j = 0 ; j < n ; j ++) for (int k = 0; k < n; k++) C[i][j] += A[i][k] * B[k][j]; System.out.println(System.currentTimeMillis()-timeKeep); } public static void main(String[] args) { MatrixMultiplyP m1 = new MatrixMultiplyP(); m1.multiply(); //m1.display(); } }
A typical problem in concurrent languages is when a section of code is accessed as a shared resource when it should not be concurrently accessed by more than one thread of execution. The isolated construct fixes this problem, by ensuring that the task will execute in mutual exclusion with all other possibly parallel interfering instances. An example use of the isolated construct applied to our previous example is:
The following code will increment n 20,000 times. Starting from 0, the correct value for n at the end should be 20,000. Run the program a few times and see the solutions. You will notice that the program does not consistently give 20,000. It gives a variety of solutions. Add an isolated statement to the code such that the solution is consistently correct.
public class isoex{ static int n=0; public static void main(String[] args){ forall(point[i]:[0:19999]){ n++; } System.out.println(n); } }
public class isoex{ static int n=0; public static void main(String[] args){ forall(point[i]:[0:19999]){ isolated{ n++; } } System.out.println(n); } }
On the inside a forall loop is just a way to create a collection of tasks, one for each iteration of a for loop. HJ allows the programmer to make tasks individually as well. This is done with the async keyword. Simply stated, async causes the following statement to execute as its own task. For example:
This creates two new tasks; the first calls function1 and the second calls both function2 and function3.
The start code below increments n up to 444,444,444 and m up to 555,555,555 and then sums them up and then prints the results and the runtime. Run the given code and check the runtime and the output. Now, wrap each loop with async. What do you notice?
public class Asyncex { public static int n = 0; public static int m = 0; public static long timeKeep; public static void main(String[] args){ timeKeep = System.currentTimeMillis(); for(int i=0; i<444444444; i++){ n++; } for(int i=0; i<555555555; i++){ m++; } System.out.println("Time "+(System.currentTimeMillis() - timeKeep)); System.out.println("Sum "+(n + m)); } }
public class AsyncexP { public static int n = 0; public static int m = 0; public static long timeKeep; public static void main(String[] args){ timeKeep = System.currentTimeMillis(); async { for(int i=0; i<444444444; i++){ n++; } } async { for(int i=0; i<555555555; i++){ m++; } } System.out.println("Time "+(System.currentTimeMillis()-timeKeep)); System.out.println("Sum "+(n + m)); /** Notice how the runtime has considerably gone down, but the result is wrong? * The advantage of running code asynchroniously is the fact that it creates a * a thread for each chunk of code wrapped around with async. The disadvantage * is that sometimes code that is dependent on a piece of code running asynchroniously * might run before the asynchronous part. This can obviously cause errors, just * like you see above. * * This issue can be fixed using the finish statement, which we'll introduce next. */ } }
Just as in low level task parallelization, it is sometimes important for all the parallelized tasks to join together when finished to prevent one thread from continuing on without the other threads finishing. In HJ this is accomplished via finish.
Let's take another look at our async example. After the two async statements run, we print the sum of the incremented values of n and m. The main problem here is that the print method executes before both loops finish. To fix this, we need to wrap the async blocks with a finish statement.
Use the finish statement on the given code to ensure the loops finish before printing.
public class Finishex { public static int n = 0; public static int m = 0; public static long timeKeep; public static void main(String[] args){ timeKeep = System.currentTimeMillis(); async { for(int i=0; i<444444444; i++){ n++; } } async { for(int i=0; i<555555555; i++){ m++; } } System.out.println("Time "+(System.currentTimeMillis()-timeKeep)); System.out.println("Sum "+(n + m)); } }
public class FinishexP { public static int n = 0; public static int m = 0; public static long timeKeep; public static void main(String[] args){ timeKeep = System.currentTimeMillis(); finish { async { for(int i=0; i<444444444; i++){ n++; } } async { for(int i=0; i<555555555; i++){ m++; } } } System.out.println("Time "+(System.currentTimeMillis()-timeKeep)); System.out.println("Sum "+(n + m)); } }
This is a construct that allows the creation of a container of type <T> that is used to store data in the parallelized task. This construct avoids the issue of having to initialize variables as a static field, instance field, or an array element. The syntax of future looks like this:
It should now be clear how to synchronize programs with features like finish and isolated. However there are still more ways to synchronize them. Consider a program that iterates through guests at a party and gives them cake and ice cream.
Consider the following code that prints out "Hello" and "Goodbye" a various number of times. Use barrier synchronization (next statement) to make sure that all "Hellos" occur before the "Goodbyes".
public class nextex{ public static void main(String[] args){ forall(point[i]:[1:5]){ System.out.println("Hello" +i); System.out.println("Goodbye" +i); } } }
public class nextex{ public static void main(String[] args){ forall(point[i]:[1:5]){ System.out.println("Hello"+i); next; System.out.println("Goodbye"+i); } } }