Condor and OOMMF

This is how I use Condor and OOMMF together to distribute a lot of simulations (hundreds). I am assuming that you are running from your AFS space and have one of the alpha4 versions of OOMMF (e-mail me if you need this). If anyone has better ways of doing any of this please let me know and I will update this page (and my methods).

1) Go to the Cooperative Computing Lab's Condor page and follow the instructions to get Condor setup in your path. Read through the page to become familiar with the submit files, etc.

2) For Condor you will need to use the command-line 3D simulator (boxsi). (If you are already using boxsi you can probably skip the next two steps). This means that your MIF files will have to be in MIF2.0 format. To convert to MIF2.0 from MIF1.0/1.1 you can do:
oommf.tcl mifconvert infile.mif outfile.mif2

3) Since you will be using boxsi from the command-line, you will need to specify in the MIF2 file the outputs that you want the simulation to give. To do this, append at the end of the .mif2 file the following lines:
Destination archive mmArchive
Schedule Oxs_TimeDriver::Magnetization archive Step 100
Schedule Oxs_TimeDriver::Magnetization archive Stage 1
Schedule "Oxs_RungeKuttaEvolve:evolver:Total field" archive Step 100
Schedule "Oxs_RungeKuttaEvolve:evolver:Total field" archive Stage 1

This will setup an mmArchive thread to write the data and tell it to output the magnetization data (.omf files) and field data (.ohf files) every 100 time steps and at every stage. These specifications are similar to those that you setup in the graphical simulator and can be extended to add more outputs as needed (see the manual for this).

4) Make sure that in your configuration files you always use absolute paths (or Condor will not work).

5) The path you run OOMMF from must be world readable/writable in AFS. Generally this is a bad thing, so make sure you don't put anything important in the OMMMF directory, and move your simulation results out when you are done. (As long as you don't broadcast that you have this directory writable it shouldn't be a problem, and hasn't been for me). To make the directory and all subdirectories world writable, you need to do:
find ./ -type d -exec fs setacl {} system:anyuser rlidwk \; -print
If this succeeds, you can do "fs listacl" in the OOMMF top-level directory and it should say "system:anyuser rlidwk" (among other things). Note that this should NOT be "rlidwka" (there should be no 'a' at the end) which would mean that anyone could modify the ACL for that directory (undesirable, they could take over the directory).

6) Make sure you have sufficient AFS quota to store all the simulation output. Each .omf/.ohf is large (500KB-1MB depending) and there can be many depending on how long the simulation runs. If you find yourself running out of space you can use Chirp to offload the results (Dr. Thain suggested that you can actually run from inside Chirp and save yourself the trouble of copying things back and forth. I will meet with him at some point and update this page with information on how to do this.)

7) All images specified in the MIFs should be in .ppm format. You can use BMPs, but sometimes it will crash the simulations (something about not being able to call the conversion tool). To convert from .bmp to .ppm you can do:
oommf.tcl any2ppm file.bmp

8) That should be it. Here is a sample submit file that shows the parameters I use. I required Condor to use 64-bit Linux machines, but you should change this depending on where you have OOMMF installed. The .output file is always empty for me and the .error file contains OOMMF errors or the summary printed after the simulation completes. The .logfile may be useful to find Condor errors.

9) Just an FYI, when I'm running a lot of simulations I see a few of them fail (10% or less). I suggest that you check each .error file to make sure that all your simulations complete and then either re-submit those that fail or run them locally.