Project Ideas

The following are rough project ideas for your consideration. Each project is accompanied by a suggested topic area for your annotated bibliography. A significant portion of your job will be to crystallize the purpose, methods, and scope of your specific project. Students are welcome to undertake projects not listed here, but should consult with the instructor before submitting a proposal.
  • Freeze Frame File System. As we have studied in class, distributed file systems must strike a balance between consistency and availability. This is because a distributed file system always assumes that the user wants the most recent view of the data. But, what if a user was willing to "freeze" a view of the file system. i.e. "Show me the state of the filesystem at 1pm yesterday." How would that change the tradeoff between consistency, availability, and performance? Build a distributed filesystem with freeze-frame semantics. Be careful to define exactly the semantics and structure of the system before beginning. Compare to a filesystem that must maintain "most recent" semantics. Hard question: How do you handle newly-written files?
    Bibliography: Consistency Management in File Systems

  • ND File System Study Many filesystems have been designed using workload assumptions from a relatively few previous workload studies. Perform a workload study of a distributed file system at Notre Dame. You might be able to trace an NFS file system in use by a research cluster, at the HPCC, or perhaps even a portion of the AFS traffic. Carry out measurements across several file systems over a period of weeks or months. How does the traffic at ND compare to previous studies? Should ND systems be changed in any way on account of the traffic patterns? Careful: This project will require less programming than other projects, but it will require a lot more careful thought, data exploration, and polished writing.
    Bibliography: File System Workload Studies

  • Distributed Access Control Lists Filesystems such as AFS allow users to create groups and access control lists, detailing who is allowed to access what files. But, suppose that the authority for various group lists is distributed. For example, you may wish to allow a particular file to be read by all graduate students and all members of the FBI authorized by Agent Riley. The list of graduate students is maintained by a server in the departmental office, and the list of FBI agents is maintained by a server in Riley's office. Both lists may change at any time, with various consequences regarding security. Build a distributed access control system, assuming no shared file system between the participants. (It need not actually be connected to a real file system.) Be sure to carefully consider the design possibilities and discuss the tradeoffs between consistency, availability, and performance. What happens when access to a given user must be revoked?
    Bibliography: Filesystems and Access Control

  • Distributed or (Peer-to-Peer?) Backup. Everyone knows that important data ought to be backed up. But, many organizations do not perform backups because nobody is willing to be responsible for establishing a backup server, shuffling tapes, and so forth. Build a distributed (peer-to-peer?) backup system for ordinary workstations, assuming that no one machine is willing to accept all backups. Each night, each workstation should search for available storage on other workstations, negotiate for permission, and transmit a backup image. That's the easy part. Hard part one: ensure that restoring from a backup image can be done reliably. Hard part two: ensure that everyone's disks are not filled with old backups after a few days.
    Bibliography: Backup Systems

  • User-Level Distributed Shared Memory - Create a distributed shared memory system entire at user level. Page faults can be created and caught at user level by using mprotect to set permissions bits, and then catching the resulting SIGSEGVs that occur when such memory is touched. Be careful to define your memory semantics and the consistency protocol necessary to ensure those semantics. Build and test several simple applications that make use of the DSM.
    Bibliography: Consistency in Distributed Shared Memory

  • Linda Optimized. - Create a distributed computing system based on Linda. Begin by building a simple centralized server and test it with several Linda applications. Propose a significant structural optimization, build it, and measure it. Be careful to understand what applications will benefit from this optimization and which will become worst. (Note that you do not have to implement a compiler and interpreter for the entire Linda language, but you could just build a library with similar functionality.)
    Bibliography: Linda Implementations

  • Fault Tolerant Linda. - Create a version of Linda that can tolerate failures. Begin by building a simple centralized server and test it with several Linda applications. Propose a modification that will allow recovery from different kinds of faults. Be careful to state what sort of faults it will or will not tolerate. Compare the FT and non-FT under different loads and failure probabilities. (Note that you do not have to implement a compiler and interpreter for the entire Linda language, but you could just build a library with similar functionality.)
    Bibliography: Linda Implementations

  • Adaptive Load Control - Distributed languages allow a user to trivially harness many independent machines; they also allow a user to accidentally create more load than a system can handle. Consider a language such as the fault tolerant shell. With a simple script, one may retrieve a file from one hundred machines in parallel (with a timeout and retry for good measure) :

         forall h in 1 .to. 100
              try for 10 minutes
                   scp node$h:bigfile bigfile.$h
              end
         end
    

    However, one hundred simultaneous copies may be more than the system can handle. Perhaps the network switch cannot keep up with all one hundred machines blasting at once. Perhaps the collecting machine has a limit on the number open sockets or file handles. Such limits are likely to differ from place to place. Modify the fault tolerant shell to adapt at run-time to the parallelism available in the current system. Caution: Make sure that your solution can adapt to a wide variety of jobs and conditions.
    Bibliography: Load Control in Distributed Systems

  • Distributed System Debugger/Tracer. Build a tool that allows you to trace and report the activity within a distributed system. Begin by tracing the activity of each component with an existing tool such as tcpdump or strace. Then, build a system to bring all of the results back to the person debugging and represent them in a coherent way. Careful: How will you ensure that events are collated in the right order?
    Bibliography: Time and Order in Distributed Debugging

    Caution: Security-related projects must be done only after obtaining the explicit permission of the person responsible for the machines and/or network that you wish to study. I will require a signed statement to this effect before allowing you to proceed with a security-related project.

  • Security Auditing Tools. Perform a audit of a real computer system or network, perhaps a research cluster in the CSE department or a public computer cluster in the College of Engineering. Create tools that allow you to scan multiple machines and/or monitor a network over a period of time. Consider employing lists of known vulnerabilities such as those published by CERT. Take note that your goal is to create and report on tools and techniques for performing security audits. Simply finding a security violation by hand doesn't count for much.
    Bibliography: Tools for Monitoring and Auditing Security