CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

DISC - Data Intensive Science Cluster

The DISC is a shared computing facility managed by the Cooperative Computing Lab and the Center for Research Computing at the University of Notre Dame. The facility provides unique capabilities for rapidly exploring, processing, and visualizing multi-terabyte datasets, in support of research groups in biology and bioinformatics, biometrics and computer vision, molecular dynamics, systems biology, and computer systems research.

User Interfaces

The following interfaces are currently available for using the DISC:
  • The BXGrid web portal provides access to biometrics research data stored on the DISC.
  • The Condor batch system provides access to the computing cycles available on the cluster.
  • The Hadoop data processing system provides the ability to run Map-Reduce jobs on the cluster.
  • The Chirp distributed filesystem presents the cluster as one big 180TB storage device visible at disc01.crc.nd.edu:9090.
  • Policy

    The DISC cluster was acquired via a Notre Dame Equipment Replacement and Renewal grant in early 2011. The five parties to the grant will have first priority to the resources available on the cluster, in approximately equal proportion:
  • Computer Vision Research Lab (Patrick Flynn (CSE) and Kevin Bowyer (CSE))
  • Bioinformatics and Biology (Scott Emrich (CSE), Jeanne Romero-Severson (BIOS), Frank Collins (BIOS), Nora Besansky (BIOS), Patricia Clark (Chem/Biochem), Michael Pfrender (BIOS))
  • Laboratory for Computational and Life Sciences (Jesus Izaguirre (CSE) and Chris Sweet (CRC)
  • Cyberinfrastructure Lab (Greg Madey (CSE))
  • The Cooperative Computing Lab (Douglas Thain CSE))
  • Other parties on campus are welcome to make use of the DISC by submitting Condor jobs or by accessing data in Hadoop. However, such use will have lower priority and may be interrupted if needed to service the primary parties. Users should note that the cluster is primarily for the analysis and processing of large data sets. While data in active use may stay resident on the cluster for some time, it is not meant to be a backup system, nor is it guaranteed to be highly reliable, so valuable data should be backed up, and cold data should be stored elsewhere.

    Hardware

    The DISC contains 26 nodes, consisting of:
  • 32GB RAM
  • 12 x 2TB SATA disks.
  • 2 x 8-core Intel Xeon E5620 CPUs @ 2.40GHz
  • Gigabit Ethernet
  • The disks on each node are operated individually, and are currently configured as follows:

    PurposeMount Point
    Disk 1 Operating System /
    Disk 2 Condor /var/condor
    Disk 3 Chirp - General /data/chirp
    Disk 4 Chirp - Biocompute /data/chirp/biocompute
    Disk 5 Chirp - Biometrics /data/chirp/bxgrid
    Disk 6 Hadoop /data/hadoop/volume1
    Disk 7 Hadoop /data/hadoop/volume2
    Disk 8 Hadoop /data/hadoop/volume3
    Disk 9 Hadoop /data/hadoop/volume4
    Disk 10 Unassigned /data/scratch1
    Disk 11 Unassigned /data/scratch2
    Disk 12 Unassigned /data/scratch3

    Both AFS and CRC /pscratch are mounted on all nodes of the cluster, to facilitate data transfer between systems.