Reference no: EM133524244
Computer Architecture
Part 1
You will need a Linux or Mac OS X development environment for this assignment. You should also be successful using the Windows Subsystem for Linux (WSL). If you do not have one of these, I recommend that you install Virtualbox or VMWare Player and use a recent version of Ubuntu Linux. Following instructions will guide you for Virtualbox.
Ubuntu Virtual Machine Development Environment
1. Download and Install Oracle VM VirtualBox for your host system.
2. Download Ubuntu (18.04 or newer) from
3. Open VirtualBox
a. Select Machine -> New
b. Enter a Name
c. Select Type -> Linux
d. Select Version -> Ubuntu (64-bit)
4. Add more RAM if your computer has it. I would allocate at least 2048 MiB, and preferably 4096 MiB, with at least 2 CPU cores.
5. Create a virtual hard disk now
a. VDI is fine since we don't need compatibility with other VMMs.
b. Dynamically allocated is also fine.
c. For installing Desktop version of Ubuntu you should use at least 32 GiB.
6. Start your new machine.
a. A window prompt should appear for the "Select start-up disk", push the folder icon and select the location of the .iso file you downloaded from ubuntu.com
7. Install Ubuntu
a. Minimal installation is fine.
b. You can choose to Download Updates while Installing.
c. Erase disk and install Ubuntu -- the disk is new and empty.
d. Click Continue on the pop-up.
e. Denver is fine for time zone.
f. Enter a username and password.
i. You don't need to put Your name unless you want to.
Coffee break.
8. Restart when prompted.
9. Login.
You can open a terminal with Ctrl+Alt+t in the GUI.
Your first benchmark
Write a program that implements Sieve of Eratosthenes using C and outputs one single integer at the end: the number of prime numbers <= n. Compile your program as a static binary. The output for n = 100,000,000 should be 5761455.
Put all your code in a single file named sieve.c and provide a Makefile to compile it.
Part 2: Cycle-Accurate Simulation
Using gem5
Here, you will run your sieve application in gem5 and change the CPU model, CPU frequency, and memory configuration and describe the changes in performance (based on the sim_seconds metric, which is the elapsed time of the simulated workload).
1. Run your sieve program in gem5 instead of the 'hello' example. Choose an appropriate input size. You should use something large enough that the
application is interesting, but not too large that gem5 takes more than 10 minutes to execute a simulation. I found that 1,000,000 on my machine takes about 5 minutes. Note: The MinorCPU (next step) takes about 10x longer than TimingSimpleCPU takes.
2. Repeat 1 but change the CPU model from TimingSimpleCPU to MinorCPU.
a. Hint: you may want to add a command line parameter to control the CPU model. Also, MinorCPU needs to be used with the -caches option on. Include the default cache parameters in your report. You need to specify the MinorCPU when you build gem5.opt. This has worked in the past:
scons CPU_MODELS="TimingSimpleCPU,MinorCPU" \ build/X86/gem5.opt -j5
3. Repeat 1 and 2 with a 32 KiB direct mapped L1 instruction cache, 64 KiB direct mapped L1 data cache, and 4 MiB 8-way L2 unified cache.
a. Hint: you may want to add a command line parameter to control the presence, size, and replacement policy of the caches. You might be able to start from se.py, with Direct Mapped specified with assoc=1
4. Vary the CPU clock from 1 GHz to 3 GHz (in steps of 500 MHz) with both CPU models with cache as described in 3.
a. Hint: you may want to add a command line parameter for the frequency.
5. Repeat 4 with 2-way instead of direct mapped L1 caches using the same L2 cache.
6. Repeat 4 with twice as much L1 cache (64K L1 i-cache, 128K L1 d-cache) using the same L2 cache.
7. Repeat 4 through 8 but change the memory configuration from DDR3_1600_8x8 to DDR3_2133_8x8 (DDR3 with a faster clock) and LPDDR2_S4_1066_1x32 (low-power DRAM often found in mobile devices).
You should retain the config.ini and stats.txt for each gem5 run. Rename and save them in some way that lets you easily keep track of which output file corresponds to which configuration that you ran. You should end up with 119 runs but you will only analyze 115 of them. Submit a compressed directory containing your raw results (config.ini and stats.txt) for all the runs. Hint: You should write a script to automate the execution and collection of results from the runs.
Part 3: Report
Write and submit a PDF file named report.pdf containing a short report with your observations and conclusions from the experiment in Part 2. This report should contain answers to the following questions:
• Which CPU model is more sensitive to changing the CPU frequency? Why do you think this is?
• Which CPU model is more sensitive to the memory technology? Why?
• Is the sieve application more sensitive to the CPU frequency or the memory technology? Why?
• Why does the smaller direct-mapped L1 cache perform better/worse/similar to the larger direct-mapped L1 cache? Why does it perform better/worse/similar to the same-sized 2-way L1 cache? Why does the larger direct-mapped L1 cache perform better/worse/similar to the smaller 2-way L1 cache?
• Suppose it costs the same with respect to the cache described in step #3 above and the DDR3_1600_8x8 memory to either: increase the associativity of the L1 cache from 1 to 2, double the size of the L1 cache, or use the DDR3_2133_8x8 memory. Which of these changes, if any, should you choose? Why?
• If you were to use a different application, do you think your conclusions would change? Why?
You may include statistical or graphical evidence to support your arguments. And include anything else you might like to say.
Attachment:- Computer Architecture.rar