CMSC 621 Fall 99 Project

CMSC 621
Project

The Mechanics

You will do this project in groups. The group shall have no less than three, and no more than four members. Groups with less than three members will not be allowed, unless extreme circumstances demand it. In any case, such groups will still be judged against the same criteria as regular groups.

On or before October 25, you will submit (electronically) a document which provides preliminary system specs and design, assumptions you have made, and a plan of execution (including a proposed timeline). This should be emailed to cmsc621@cs.umbc.edu with the subject "First Project Report".
On November 25, you will submit a brief report of your progress to date, changes in the design if any, as well as an updated timeline. Again, this will be via email , with the subject "Second Project Report".
The project report detailing your work (the system design, experimental studies etc.) as well as submission of your code, will due on December 15, 1999 in a similar manner - emailed with subject as "Final Report". Code should be attached as a tarred, gzipped file. Please note that pdf, ps and text (formatted to 72 columns) are the only formats we will accept. You will also arrange to demonstrate your project to the instructor between December 6 and December 17. The report and code submission must precede the demonstration, however.

You are allowed to discuss the project across groups. Clearly, you are not allowed to share solutions. You may read papers and textbooks in this area as well -- some pointers are provided in this document. However, you should cite the sources you have consulted. It is intended that you will do this project using Java on the CS/UCS Unix systems. However, you are free to chose another imperative language should you so desire, as long as the project runs on CS/UCS machines.

The Bare bones Task (65 points)

In this project, you will design a simple distributed file system with replication and disconnection management features. The system will consist of several file servers, and clients which interact with them. Normal operations related to file systems, such as creating, deleting, opening, closing, reading, writing, seeking etc. should be allowed. These requests should be processed by the fileserver that the client is connected to. When a file is created by a client, it resides on the fileserver which the client is connected to, and is owned by this client. In other words, the primary copy of the file is kept on this server. When some fileserver receives a request for operations on a file, it should contact the server which owns the file and create a replica locally. All operations should then proceed on the replica. For the basic task, we will create a pessimistic replication scheme. In other words, any operations (such as write) which could throw the replicas out of sync will be permitted on one replica only. However, any number of file servers can freely create replicas and perform operations which do not alter the file. Your system should allow a fileserver wanting to create a replica to tell the owner filesystem the mode in which it wants a file. If the mode is one which will change the file, then the requester should also specify a lease period. The lessee fileserver promises to return any changes to the file to the lessor (owner) within the lease period, or seek an extension, which may or may not be granted. If changes are returned in time, then the owner fileserver should "commit" them, and update all other replicas appropriately. If the lessee does not return the changes before lease expiry, then the lessor should invalidate any changes that the lessee many have made (and send it a message to this effect). Any request for a lease on a file that has already been leased out should be blocked. The owner should lease the the file out in a FIFO process and unblock the corresponding process.

Optimistic Replication (25 points)

The idea here is to extend the bare bones system to permit multiple servers to simultaneously operate on replicas in a manner which will change them (e.g. concurrent writes on replicas). There are several published approaches in literature to achieve this, and you can chose to implement (some subset) of any one of them. Good starting points include the CODA and SPRITE papers on the course web site, besides the URLs provided below. When the changes are returned, they should be merged as needed and the file updated. If the changes are inconsistent, your system should simply flag an error and chose one of the changes to commit. Your design documents should specify exactly what you would implement.

Fileserver Discovery (10 points)

One problem with the scheme we have outlined is that a client needs to know up front about the appropriate fileserver. This can create a problem with mobile systems -- where the client moves and can reconnect at any arbitrary network (Imagine unplugging a machine from your office, and plugging it back in here at campus, and still have it work). What is needed is a mechanism that will allow a client to find its nearest fileserver automatically. Your system should permit this. (Hint: Think Jini).

Experiments

A part of your project is to design and carry out an experimental validation to convince us that your system works. You should also analyze your system for scalability -- both in the number of clients and the number of servers. You should experiment with different workload mixtures. Comment on the results you get -- try to identify what assumptions you made or implementation mechanisms of your system cause particular scalability patterns. Try and identify both the strengths and weaknesses of your system. If you use your experimental results to refine your design, make sure you bring this out in your report.

Some general suggestions

As should be evident to most of you, it is imperative for a project of this complexity and involving teams that you design your system before you code! In your design, you will need to make assumptions as you flesh in the details of the system. Please make sure that you state them in your design document. Make a timeline for your work, and try and stick to it. Where you divide tasks, make sure you clearly define points of articulation and interfaces between modules. As you form groups, please make sure that you can find a common time to meet. This is especially true for those who are part time students and hold jobs which will restrict your schedule. Please comment your code well -- it will help both you and us. You in figuring out code your partners have written, us in grading it. Also, use some form of revision control on your source tree. CS/UCS machines have systems such as CVS and RCS available for your use. This will help if lightening strikes, UPC fails and machines/disks crash, making your recent changes disappear! Please do create makefiles as well.

References

The Coda System (http://www.cs.cmu.edu/afs/cs/project/coda/Web/coda.html)
The FICUS System (http://ficus-www.cs.ucla.edu/ficus/)
The Sprite System (http://www.cs.berkeley.edu/projects/sprite/sprite.html)
The NOW/xFS System (http://now.cs.berkeley.edu/Xfs/xfs.html)