Cooperative Computing Lab
CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

Makeflow Tutorial

  1. Getting Started
    1. Login to a CRC Head Node
    2. Download, Build and Install CCTools
    3. Set Environment Variables
  2. Makeflow Example
    1. Setup
    2. Running with Local (Multiprocess) Execution
    3. Running with CRC's SGE

This tutorial will have you install CCTools into your CRC home directory and will take you through some distributed computation examples using Makeflow.

Getting Started

Login to a CRC Head Node

For this tutorial, we assume you have an open SSH connection to the CRC login nodes. If you do not have an account with CRC, then you may register here.

NOTE: If you do not have a CRC account, you can still do most of the tutorial on a local linux machine, including everything except the section where tasks are submitted to SGE.

In this tutorial, we will use the newcell login node: > ssh newcell.crc.nd.edu You can also use opteron.crc.nd.edu or stats.crc.nd.edu.

Download, Build, and Install CCTools

Navigate to the download page in your browser to review the most recent versions: http://www.nd.edu/~ccl/software/download.shtml

Setup a Sandbox for this Tutorial and Download a copy of CCTools 3.6.1

> mkdir ~/cctools-tutorial > wget http://nd.edu/~ccl/software/files/cctools-3.6.1-source.tar.gz ... > tar xzf cctools-3.6.1-source.tar.gz

Build and Install CCTools

> cd ~/cctools-3.6.1-source > ./configure ... > make install ...

Set Environment Variables

You will need to add your CCTools directory to your $PATH: > setenv PATH ~/cctools/bin:${PATH}

Makeflow Example

Write a Makeflow to check if Shakespearean language is still used

In this example we will setup and run a Makeflow script to analyze five Shakespearean plays and determine what words are still in use in modern English.

At the conclusion of this example, participants will be able to:

  • Identify components of a workflow
  • Execute Makeflow scripts on multiple systems

Setup

> mkdir ~/cctools-tutorial/makeflow > cd ~/cctools-tutorial/makeflow Download the following:

  • word-compare.py: our application executable for this exercise. This checks the system dictionary for every word in the input file, and prints each word it finds.
  • Makeflow script: that defines the workflow.
  • shakespeare-text.tgz: an archive containing the words in Hamlet, MacBeth, Othello, Julius Caesar, and King Lear

> wget http://nd.edu/~ccl/software/tutorials/ndtut12/makeflow/word-compare.py ... > wget http://nd.edu/~ccl/software/tutorials/ndtut12/makeflow/Makeflow ... > wget http://nd.edu/~ccl/software/tutorials/ndtut12/makeflow/shakespeare-text.tgz ... Next, unpack shakespeare-text.tar.gz. > tar xzf shakespeare-text.tgz The Makeflow script should look like: > cat Makeflow hamlet.checks: word-compare.py hamlet.txt python word-compare.py hamlet.txt > hamlet.checks macbeth.checks: word-compare.py macbeth.txt python word-compare.py macbeth.txt > macbeth.checks othello.checks: word-compare.py othello.txt python word-compare.py othello.txt > othello.checks julius-caesar.checks: word-compare.py julius-caesar.txt python word-compare.py julius-caesar.txt > julius-caesar.checks king-lear.checks: word-compare.py king-lear.txt python word-compare.py king-lear.txt > king-lear.checks This makeflow contains 5 rules. Each rule checks the system dictionary for every word in one of Shakespeare's most famous plays, and saves the result into a file. The dependencies for each rule include both the word comparison script and the input data to be analyzed.

Running with Local (Multiprocess) Execution

Here we're going to tell Makeflow to dispatch the jobs using regular local processes (no distributed computing!). This is basically the same as regular Unix Make using the -j flag. > makeflow -T local If everything worked out correctly, you should see: > makeflow -T local python word-compare.py king-lear.txt > king-lear.checks python word-compare.py julius-caesar.txt > julius-caesar.checks python word-compare.py othello.txt > othello.checks python word-compare.py macbeth.txt > macbeth.checks python word-compare.py hamlet.txt > hamlet.checks nothing left to do.

Running with CRC's SGE

NOTE: To do this section, you MUST be logged into a CRC machine (newcell.crc.nd.edu, opteron.crc.nd.edu, etc...)

The following code tells Makeflow to dispatch jobs using the SGE batch submission system (qsub, qdel, qstat, etc.). > makeflow -T sge You will get as output: > makeflow -T sge nothing left to do. Well... that's not right. Nothing was run! We need to clean out the generated output files and logs so Makeflow starts from a clean slate again: > makeflow -c We see it deleted the files we generated in the last run: > makeflow -c deleted file king-lear.checks deleted file julius-caesar.checks deleted file othello.checks deleted file macbeth.checks deleted file hamlet.checks deleted file ./Makeflow.makeflowlog Now let's try again: > makeflow -T sge We get the output we expect: > makeflow -T sge python word-compare.py king-lear.txt > king-lear.checks python word-compare.py julius-caesar.txt > julius-caesar.checks python word-compare.py othello.txt > othello.checks python word-compare.py macbeth.txt > macbeth.checks python word-compare.py hamlet.txt > hamlet.checks nothing left to do. Notice that the output is no different from using local execution. Makeflow is built to be execution engine agnostic. There is no difference between executing the task locally or remotely.

In this case, we can confirm that the job was run on another host by looking at the output produced by the simulation: > head king-lear.checks Running on host dqcneh081.crc.nd.edu Starting 2012 23 Oct 14:40:09 a abated abatement abhorred abjure able abode Here we see that the worker ran on node dqcneh081.crc.nd.edu.