Distributed Computing
For my Ph.D., I am working with Dr. Douglas Thain as a member of the
Cooperative Computing Lab. Currently, my main research focus is
investigating methods of combining various distributed system tools and
services into a single framework or programming model. The goal of such a
software framework is to simplify the use of distributed abstractions and
services so that end users may construct scalable distributed workflows.
Storage
As part of my research, I develop and maintain a few filesystem adapters for
the Parrot middleware application along with various other storage service
software.
- bxgrid-query: Script that provides command line access to the BXGrid
web services.
- parrot-bxgrid: This module provides transparent read-only access to any
ROARS filesystem. Currently it is used extensively by our internal BXGrid
biometrics website to provide fault-tolerant and robust access to biometric
data.
- parrot-hdfs: This module provides read-write access to the Hadoop
Distributed Filesystem and has been testing using a variety execution
engines such as Condor and WorkQueue.
Computation
In addition to my work on storage services, I also experiment with various
computational abstractions and frameworks.
- Weaver: This high level distributed computing workflow framework in Python
designed to simplify and facilitate the use various computational and storage
abstractions provided by the CCL.
- python-workqueue: Python WorkQueue bindings to enable the development
of master/worker type applications.
- Starch: Tool for creating executable standalone application archives.
Operations
I also contribute to the maintenance and development of various CCL and
BXGrid operations.
Publications
Journals
- Irena Lanc, Peter Bui, Douglas Thain, and Scott Emrich. Adapting
Bioinformatics Applications for Heterogeneous Systems: A Case Study.
Submitted to Concurrency and Computation: Practice and Experience.
October, 2011.
- Peter Bui, Li Yu, Andrew Thrasher, Rory Carmichael, Irena Lanc, Patrick
Donnelly, and Douglas Thain. Scripting distributed scientific workflows
using Weaver, Concurrency and Computation: Practice and
Experience. November, 2011.
Workshops
- Peter Bui, Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, Douglas
Thain. Work Queue + Python: A Framework For Scalable Scientific Ensemble
Applications. Workshop on Python for High Performance and
Scientific Computing at SC 2011. November, 2011.
- Irena Lanc, Peter Bui, Douglas Thain, and Scott Emrich. Adapting
Bioinformatics Applications for Heterogeneous Systems: A Case Study. The
Second International Workshop on Emerging Computational Methods for the Life
Sciences, pages 7-14. June, 2011.
- Andrew Thrasher, Rory Carmichael, Peter Bui, Li Yu, Douglas Thain, and
Scott Emrich. Taming Complex Bioinformatics Workflows with Weaver,
Makeflow, and Starch. Workshop on Workflows in Support of Large Scale
Science, pages 1-6. November, 2010.
- Hoang Bui, Peter Bui, Patrick Flynn, and Douglas Thain. ROARS: A Scalable
Repository for Data Intensive Scientific Computing. The Third International
Workshop on Data Intensive Distributed Computing at ACM HPDC 2010. June,
2010.
- Peter Bui, Li Yu, and Douglas Thain. Weaver: Integrating Distributed
Computing Abstractions into Scientific Workflows using Python. Challenges
of Large Applications in Distributed Environments at ACM HPDC 2010. June,
2010.
Book Chapters
- Douglas Thain, Michael Albrecht, Hoang Bui, Peter Bui, Rory Carmichael,
Scott Emrich, and Patrick Flynn. Data Intensive Computing with Clustered
Chirp Servers. Data Intensive Distributed Computing: Challenges and
Solutions for Large-scale Information Management, IGI Global, Chapter 7.