Freeze Frame File System. As we have studied in class,
distributed file systems must strike a balance between consistency
and availability. This is because a distributed file system always
assumes that the user wants the most recent view of the data.
But, what if a user was willing to "freeze" a view of the file system.
i.e. "Show me the state of the filesystem at 1pm yesterday."
How would that change the tradeoff between consistency, availability,
and performance?
Build a distributed filesystem with freeze-frame semantics.
Be careful to define exactly the semantics and structure of the system
before beginning. Compare to a filesystem that must maintain "most recent"
semantics. Hard question: How do you handle newly-written files?
Bibliography: Consistency Management in File Systems
ND File System Study Many filesystems have been designed
using workload assumptions from a relatively few previous workload studies.
Perform a workload study of a distributed file system at Notre Dame.
You might be able to trace an NFS file system in use by a research cluster,
at the HPCC, or perhaps even a portion of the AFS traffic.
Carry out measurements across several file systems over a period of weeks
or months. How does the traffic at ND compare to previous studies?
Should ND systems be changed in any way on account of the traffic patterns?
Careful: This project will require less programming than other projects,
but it will require a lot more careful thought, data exploration, and
polished writing.
Bibliography: File System Workload Studies
Distributed Access Control Lists Filesystems such as AFS
allow users to create groups and access control lists, detailing who
is allowed to access what files. But, suppose that the authority for
various group lists is distributed. For example, you may wish to allow
a particular file to be read by all graduate students and all members
of the FBI authorized by Agent Riley. The list of graduate students
is maintained by a server in the departmental office, and the list of
FBI agents is maintained by a server in Riley's office. Both lists
may change at any time, with various consequences regarding security.
Build a distributed access control system, assuming no shared file system
between the participants.
(It need not actually be connected to a real file system.)
Be sure to carefully consider the design possibilities and discuss
the tradeoffs between consistency, availability, and performance.
What happens when access to a given user must be revoked?
Bibliography: Filesystems and Access Control
Distributed or (Peer-to-Peer?) Backup. Everyone knows that
important data ought to be backed up. But, many organizations do not
perform backups because nobody is willing to be responsible for establishing
a backup server, shuffling tapes, and so forth.
Build a distributed (peer-to-peer?) backup system for ordinary workstations,
assuming that no one machine is willing to accept all backups.
Each night, each workstation should search for available
storage on other workstations, negotiate for permission, and
transmit a backup image. That's the easy part.
Hard part one: ensure that restoring from a backup image can be done reliably.
Hard part two: ensure that everyone's disks are not
filled with old backups after a few days.
Bibliography: Backup Systems
User-Level Distributed Shared Memory - Create a distributed
shared memory system entire at user level. Page faults can be created
and caught at user level by using mprotect to set permissions
bits, and then catching the resulting SIGSEGVs that occur when
such memory is touched. Be careful to define your memory semantics and
the consistency protocol necessary to ensure those semantics. Build
and test several simple applications that make use of the DSM.
Bibliography: Consistency in Distributed Shared Memory
Linda Optimized. - Create a distributed computing system based on Linda.
Begin by building a simple centralized server and test it with several
Linda applications. Propose a significant structural optimization,
build it, and measure it. Be careful to understand what applications will
benefit from this optimization and which will become worst.
(Note that you do not have to implement a compiler and interpreter for the
entire Linda language, but you could just build a library with similar
functionality.)
Bibliography: Linda Implementations
Fault Tolerant Linda. - Create a version of Linda that can tolerate
failures. Begin by building a simple centralized server and
test it with several Linda applications. Propose a modification that will
allow recovery from different kinds of faults. Be careful to state what
sort of faults it will or will not tolerate. Compare the FT and non-FT
under different loads and failure probabilities.
(Note that you do not have to implement a compiler and interpreter for the
entire Linda language, but you could just build a library with similar
functionality.)
Bibliography: Linda Implementations
Adaptive Load Control - Distributed languages allow a user
to trivially harness many independent machines; they also allow a user
to accidentally create more load than a system can handle. Consider
a language such as
the fault tolerant shell. With a simple script, one may retrieve
a file from one hundred machines in parallel (with a timeout and retry
for good measure) :
forall h in 1 .to. 100
try for 10 minutes
scp node$h:bigfile bigfile.$h
end
end
However, one hundred simultaneous copies may be more than the system
can handle. Perhaps the network switch cannot keep up with all one hundred
machines blasting at once. Perhaps the collecting machine has a limit
on the number open sockets or file handles. Such limits are likely to
differ from place to place. Modify the fault tolerant shell
to adapt at run-time to the parallelism available in the current system.
Caution: Make sure that your solution can adapt to a wide variety
of jobs and conditions.
Bibliography: Load Control in Distributed Systems
Distributed System Debugger/Tracer. Build a tool that
allows you to trace and report the activity within
a distributed system. Begin by tracing the activity of each component
with an existing tool such as tcpdump or strace. Then,
build a system to bring all of the results back to the person debugging
and represent them in a coherent way. Careful: How will you ensure that
events are collated in the right order?
Bibliography: Time and Order in Distributed Debugging
Caution: Security-related projects must be done only
after obtaining the explicit permission of the person responsible for the
machines and/or network that you wish to study.
I will require a signed statement to this effect
before allowing you to proceed with a security-related project.
Security Auditing Tools. Perform a audit of a real
computer system or network, perhaps a research cluster in the CSE
department or a public computer cluster in the College of Engineering.
Create tools that allow you to scan multiple machines and/or monitor
a network over a period of time. Consider employing lists of
known vulnerabilities such as those published by CERT. Take note
that your goal is to create and report on tools and techniques
for performing security audits. Simply finding a security violation
by hand doesn't count for much.
Bibliography: Tools for Monitoring and Auditing Security