Reference no: EM132663168
Question 1
In this question, we will compare the performance of two processors, pA and pB. The processors are identical except:
- The ALU in pB has specialized multiplier logic and supports the mult instruction
- The ALU in pA does not have multiplier logic and does not support the mult instruction
- pB's cycle time is 20% longer than pA's cycle time due to the added complexity
Because pA does not support a mult instruction, multiplication must be done by a software algorithm that uses other available instructions to calculate the result. (The assembler for pA provides a mult pseudo-instruction, which is expanded to several instructions during assembly.)
Assume the following real-world instruction distribution for each processor running the same workload.
|
pA
|
pB
|
CPI
|
MULT
|
0%
|
5%
|
32
|
LW
|
5%
|
25%
|
3
|
SW
|
5%
|
25%
|
2
|
R-TYPE
|
70%
|
15%
|
1
|
BEQ
|
20%
|
30%
|
2
|
Question 1.1 What is the average CPI of pA for this workload?
Question 1.2 What is the average CPI of pB for this workload?
Question 1.3 What is the ratio of the total number of executed instructions of pA over that of pB? (hint: LW and SW instructions remain the same in pB as they are not used for multiplication)
Question 1.4 pA runs the workload in 150 seconds. How many seconds does pB take to run the same workload?
Question 2
For the problems in this exercise, assume that there are no stalls in the single-cycle processor. Before executing the following code, the initial values (decimal format) of register files are listed as below.
R1
|
R2
|
R3
|
R4
|
R5
|
R6
|
5
|
3
|
0
|
1
|
6
|
2
|
R7
|
R8
|
R9
|
R10
|
R11
|
R12
|
4
|
12
|
9
|
2
|
10
|
0
|
loop1:
addi r11, r10, 4
sw r2, 16(r7)
nor r3, r2, r6
loop2:
lw r4, O(r2)
lw r8, O(r3)
add r5, r4, r8
nor r6, r3, r5
sw r6, O(r2)
addi r11, r11, -1
bne r11, r12, loop2
addi r9, r9, -1
bne r9, r2, loop1
Question 2.1 What is the final value of r9?
Question 2.2 How many times is each instruction executed?
add
addi
bne
lw
sw
nor
Question 2.3 In what how many cycles is the instruction memory used?
Question 2.4 In how many cycles is the data memory used?
Question 2.5 In how many cycles is the input of the sign-extend circuit needed?