Assignment 04: Distributed File systems

University of Puerto Rico at Rio Piedras
Department of Computer Science
CCOM4017: Operating Systems

Introduction

In this project the student will implement the main components of a file system by implementing a simple, yet functional, distributed file system (DFS). The project will expand the student knowledge of the components of file systems (inodes, and data blocks), will develop the student skills in inter process communication, and will increase their system security awereness.

The components to implement are:

Objectives

Prerequisites

The metadadata server's database manipulation functions.

No expertise in database management is required to acomplish this project. However the sqlite3 is used to store the file inodes in the metadata server. You don't need to understand the functions but you need to read the documentation of the functions that interact with the database. The metadata server database functions are defined in file mds_db.py.

Inode

For this implementation an inode consists of:

Block List

The block list consists of a list of:

Functions:

The packet manipulation functions:

The packet library is designed to serialize the communication data using the json library. No expertise with json is required to accomplish this assignment. These functions were developed to ease the packet generation process of the project. The packet library is defined in file Packet.py.

In this project all packet objects have a packet type among the following options:

Functions:

General Functions
Packet Registration Functions
Packet List Functions
Get Packet Functions
Put Packet Functions (Put Blocks)
Get Data block Functions (Get Blocks)

Instructions

Write and complete code for an unreliable and insecure distributed file server following the specifications bellow.

Design specifications.

For this project you will design and complete a distributed file system. You will write a DFS with tools to list the files, and to copy files from and to the DFS.

Your DFS will consist of:

The meta data server

The meta data server containst the metadata (inode) information of the files in your file system. It will also keep registry of the data servers that are connected to the DFS.

Your metadata server must provide the following services:

  1. Listen to the data nodes that are part of the DFS. Every time a new data node registers to the DFS the metadata server must keep the contact information of that data node. This is (IP Address, Listening Port).

    • To ease the implementation the DFS the directory file system must contain three things:
      • the path of the file in the file system
      • the nodes that contain the data blocks of the files
      • the file size
  2. Every time a client (commands list or copy) contacts the meta data server for:

    • requesting to read a file: the metadata server must check if the file is in the DFS database, and if it is, it must return the nodes with the blocks_ids that contain the file.
    • requesting to write a file: the metadata server must:
      • insert in the database the path of the new file and name of the file, and its size.
      • return a list of available data nodes where to write the chunks of the file
      • then store the data blocks that have the information of the data nodes and the block ids of the file.
    • requesting to list files:
      • the metadata server must return a list with the files in the DFS and their size.

The metadata server must be run:

python meta-data.py <port, default=8000>

If no port is specified the port 8000 will be used by default.

The data node server

The data node is the process that receives and saves the data blocks of the files. It must first register with the metadata server as soon as it starts its execution. The data node receives the data from the clients when the client wants to write a file, and returns the data when the client wants to read a file.

Your data node must provide the following services:

  1. Listen to writes (puts):
    • The data node will receive blocks of data, store them using and unique id, and return the unique id.
    • Each node must have its own blocks storage path. You may run more than one data node per system.
  2. Listen to reads (gets)
    • The data node will receive request for data blocks, and it must read the data block, and return its content.

The data nodes must be run:

python data-node.py <server address> <port> <data path> <metadata port,default=8000>

Server address is the meta data server address, port is the data-node port number, data path is a path to a directory to store the data blocks, and metadata port is the optional metadata port if it was ran in a different port other than the default port.

Note: Since you most probably do not have many different computers at your disposition, you may run more than one data-node in the same computer but the listening port and their data block directory must be different.

The list client

The list client just sends a list request to the meta data server and then waits for a list of file names with their size.

The output must look like:

/home/cheo/asig.cpp 30 bytes
/home/hola.txt 200 bytes
/home/saludos.dat 2000 bytes

The list client must be run:

python ls.py <server>:<port, default=8000>

Where server is the metadata server IP and port is the metadata server port. If the default port is not indicated the default port is 8000 and no ':' character is necessary.

The copy client

The copy client is more complicated than the list client. It is in charge of copying the files from and to the DFS.

The copy client must:

  1. Write files in the DFS
    • The client must send to the metadata server the file name and size of the file to write.
    • Wait for the metadata server response with the list of available data nodes.
    • Send the data blocks to each data node.
      • You may decide to divide the file over the number of data servers.
      • You may divide the file into X size blocks and send it to the data servers in round robin.
  2. Read files from the DFS
    • Contact the metadata server with the file name to read.
    • Wait for the block list with the bloc id and data server information
    • Retrieve the file blocks from the data servers.
      • This part will depend on the division algorithm used in step 1.

The copy client must be run:

Copy from DFS:

 python copy.py <server>:<port>:<dfs file path> <destination file>

To DFS:

python copy.py <source file> <server>:<port>:<dfs file path>

Where server is the metadata server IP address, and port is the metadata server port.

Creating an empty database

The script createdb.py generates an empty database dfs.db for the project.

    python createdb.py

Deliverables

Rubric