Reference no: EM132271982
Objectives
1. To evaluate file system performance in the face of sequential I/O requests.
2. To evaluate the impact of multiple competing threads attempting to read/write simultaneously.
Guidelines
The goal of this assignment is to gain experience with simple evaluation of the performance of the file system. Specifically you will be testing the performance of the system under varying conditions. I/O performance can be affected by more than the volume of data being moved. For example it can be affected by the size of the individual requests being made, whether the requests are reading or writing data, and by the degree of contention for access to the disk. It can also be affected dramatically by the pattern of data access (e.g., whether it is sequential or random), but we will only be looking at sequential access in this assignment.
As a first step, start by creating a collection of test files of varying sizes. Create a set of files of length 100000, 1000000, 10000000, 100000000, and 10000000000 bytes. These files can be filled with any random data you desire (or all zeroes if you prefer, the actual content of the files does not matter, only their actual size). You can create these files at the command line using the "cat" and "head" commands (where might you find a file of endless zeroes or random numbers to use as a source?).
Now you should create four test programs, and time the running of each one.
1. Your first program should read a file from beginning to end. It should accept the filename as a parameter. Each read call you make will specify a buffer and a read request size. This program should use a buffer of size 10000 bytes, which you will use for each read. How long does this program take to read through each test file (you may use the "time" command at the shell command line to time the program).
2. Your second program should accept a numerical parameter at the command line, and should repeat the behavior of your first program, but should now use a buffer of size N (where N is the value accepted at the command line) for each read request. This allows you to re-run the previous timing test but for differently sized I/O operations. Time your second program for read sizes ranging between 100, 1000, 10000 and 10000 bytes.
3. Your third program should extend your second program by adding one more step. For each read operation, there should now be a corresponding write operation where the data just read is written out to a newly created file. In other words, you should now be timing a program that copies each file that it is run against. You are now testing the speed of sequential reads+writes for files of varying size, and using I/O operations of varying size.
4. Your fourth and final program will create multiple copies of each file. You will create a number of threads, each of which will read and copy a file (as was done in program 3), but now each thread copies the file to a new file named based on the thread number. Run your timing experiments for all files, using all read/write sizes you used in parts 2 and 3, and for the following number of threads: 2, 8, 32, and 64.
Guidelines
The goal of this assignment is to gain experience with a simple parity-based resilience scheme. The basic idea is the same one behind RAID storage arrays. If you evaluate the XOR of a sequence of binary values and make a note of that result, calling it the "parity" of the original sequence, then you should be able to reconstruct any individual value from the original sequence by taking the XOR of the remaining values alongside the parity value.
This simple idea is the basis of basic RAID-4 and RAID-5 disk arrays. In this assignment you will build a simple tool to demonstrate this idea on a set of random test files. As a first step, start by creating a collection of test files of equal size
Create a set of four files of length 100000000 bytes, filled with random data. As with last week's assignment, you can create these files at the command line using the "cat" and "head" commands (where might you find a file of endless random numbers to use as a source?).
Now you should create two test programs.
1. Your first program should read two files from beginning to end. It should accept the filenames as command line parameters. Each read call you make will specify a buffer and a read request size. This program should use two buffers, each of size 10000 bytes, which you will use for each read from each of the two files respectively.
You are to XOR the contents of the two buffers, and write the output to a third buffer (or overwrite one of the two buffers if you wish, and are careful). You are to print the contents of the output buffer to stdout. This means you should be able to run your program (called "raid-program" in this example) to produce an output file that represents the parity of the two source files.
$ raid-program
<filename1> <filename2>
>
<outputfilename>
2. Run your program using the output file as one of two input files (with e.g, "filename1" as the other). Compare the output of this run with the contents of the third file (e.g.,"filename2").
3. Your second program should be able to take up to 10 input filenames on the command line, but once again should produce a single stream of parity values as the output. This means you are reading from up to 10 files into up to 10 buffers, and calculating an XOR across all these buffers.
4. Describe how you would use your second program to recover a missing input file.
Attachment:- Project.zip