CMSC 621- Fall 2002

Project

Overview


In this project, you will design and implement a simple distributed file system with file discovery and sharing. The system should allow on demand sharing of files between users. The system will consist of a number of nodes/hosts which are unreliable, in as much as that there is no guarantee that at any given time a particular node will be operational or accessible. Normal operations related to file systems, such as creating, deleting, opening, closing, reading, writing, seeking etc. should be allowed. Files should have access rights associated with them similar to Unix Files, and you may assume that userids are unique across the system.


Documents  (10 points)


You are allowed to discuss the project across groups. Clearly, you are not allowed to share solutions. You may read papers and textbooks in this area as well -- some pointers are provided in this document. However, you should cite the sources you have consulted. It is intended that you will do this project on the CS/UCS Unix systems. However, you are free to develop the code on your home/work machine, should you so desire, as long as the project runs on CS/UCS machines.
 

Design and Implementation: (15 + 35 points)

 

Details:

Each user of the system owns a set of nodes. Think of them as different machines (pda, laptop, cell phone etc) that have a fixed storage capacity. Each user also owns a set of files. The aim is for users to be able to access files from any of their devices at all times no matter which nodes in the system are functional at that time. The location of the file should be transparent to the user. You will need to use some form of replication for this – however you must ensure consistency in as strong a manner as possible. You can assume that you know the IDs (e.g. ip address) of all nodes, that at least one node owned by each user is up at any given time, and that that no messages are lost in transit.


Cost Model:

If a file that a particular user owns cannot be made available there is a cost associated with it as specified below. There is a cost associated with network traffic equal to the size of the messages exchanged.  

 

Cost associated with not finding ones own file is 3 * the size of the file.

Cost associated with not finding others files is 2 * file size

 

Testing and Validation: (20 Points)


When the system starts, its initial state is specified as detailed below.  In addition, a request set can  be specified -- if  one is specified then your system should simulate its exection.

  1. A description of which files are on which nodes will be provided for each user with the size and access rights. (i.e each user gets a file with the listing and distribution of ONLY their files). -- (inital.dtd , init1.xml)
  2. A file with the requests for each node will be provided and type of request and the time to make the request is provided to each user. The time is relative from the first request for the node. eg. (File Name, FileSize, type of access( read, write ( n no of bytes ), etc) ). req.dtd , req.xml
  3. The name and size of the files will be provided. The content is arbitrary, you can generate any random content of appropriate size and store it. FileNames.txt
  4. Readme File.

The format of all test files mentioned above will be released by October 10. When finished, your system should provide the cost incurred by each node for the given request set, and which requests could not be satisfied.


You should carry out an experimental validation to convince us that your system works. Graphs are very important in this respect so as to give us a clear understanding of your system under varying types of requests, sizes, loads etc. Comment on the results of your experiments and tests. Try and identify both the strengths and weaknesses of your system. If you use your experimental results to refine your design, make sure you bring this out in your report.  Its a good idea to log the working of your system. This helps in convincing us if for some reason your system doesn’t perform as expected during the demo.

 


 

Assumptions: (20 points)

The description of the system above, like in real life, is underspecified. You will need to make additional assumptions to come up with a concrete design and implementation. You should realize that the aim of the system is to minimize the cost of each node. Thus valid assumptions to achieve this goal will be given credit. Look at papers on similar systems for ideas. The more general your assumptions, the more the credit you get. 

 

Demonstration:

You will submit your code and project report on the due date (5th December 2002). No changes will be permitted after this. After you submit your code you will be provided test files mentioned above. You must derive the cost for the input set provided and provide a brief explanation as to how you got arrived that that number by describing the process step by step. Bring this to the demo. We will cross examine the results you provided and the actual results from the running system. We could also test your system with other files at the demo.

Some general suggestions

As should be evident to most of you, it is imperative for a project of this complexity and involving teams that you design your system before you code! In your design, you will need to make assumptions as you flesh in the details of the system. Please make sure that you state them in your design document. Make a timeline for your work, and try and stick to it. Where you divide tasks, make sure you clearly define points of articulation and interfaces between modules. As you form groups, please make sure that you can find a common time to meet. This is especially true for those who are part time students and hold jobs which will restrict your schedule. Please comment your code well -- it will help both you and us. You in figuring out code your partners have written, us in grading it. Also, use some form of revision control on your source tree. CS/UCS machines have systems such as CVS and RCS available for your use. This will help if lightening strikes, UPC fails and machines/disks crash, making your recent changes disappear! Please do create makefiles as well, or better still, help your instructor learn about ANTS.

References

1. CODA paper from the course web page

2. RUMOR paper from the course web page

3. Services such as Napster, Gnutella, morpheus etc.

4. PFS http://www.spa.is.uec.ac.jp/~tate/pfs/

4. Softupdates ideas in FreeBSD OS

5. FLAPPS (http://flapps.cs.ucla.edu/)



Anupam Joshi

Last modified: Tue Oct 02 15:16:18 EDT 2001