Reference no: EM132885171
Question 1:
a. Identify the difference between microprocessors and SoCs used in the recent architectures.
b. Describe the Flynn's taxonomy in parallel processing.
c. Suppose we have two processor implementations of the same instruction set architecture. Computer A has a clock cycle time of 300 ps and CPI of 2.5 for programG, and computer B has a clock cycle time of 500 ps and a CPI of 1.5 for the same programG Compare the computer performances based on speed.
d. Some microprocessors today are designed to have adjustable voltage. a 20% of reduction in voltage may result in a 10% reduction in frequency. What would be the impact on dynamic energy and on dynamic power?
e. Microprocessor manufacturer A designs a microprocessor using a 15 cm diameter wafer, has a cost of 12 C' AD. contain 84 dies, and has a 020 defectsicm2. Manufacturer B designs using a 20 cm diameter wafer has a cost of 15 CAD, contains 100 dies and has 0.031 defects/cm2
a. Find the yield for wafers from manufacturers A and B.
b. Find the cost per die for wafers from manufacturers A and B.
Question 2 Computer performance
Voltage and clock frequency are two factors that determine the power and energy consumed in a processor.
a. Briefly explain how these parameters influence static as well as dynamic power and energy. [100 words]
b. Briefly explain one way of minimizing static power and one way of reducing dynamic power consumption. [100 words]
Question 3 RISC V and Pipelining
Consider the following RISC code,
addi x6, x0, 0
addi x29, x0, 100
loop: Id x7, 0(x10)
add x5, x5, x7
addi x10, x,10, 8
addi x6, x6, 1
bit x6, x29, loop
a. Translate the above RISC V program segment to a C program. Assume that C level (long long) integer i is held in register x5. x6 holds the C-level integer called result. and x10 holds the base address of the integer block.
b. Calculate how many clock cycles will take for execution of this segment on the regular (non-pipelined) architecture. Justify your answer.
c. Calculate how many clock cycles will take for execution of this segment on the simple pipeline without forwarding or bypassing when result of the branch instruction (new PC content) is available after WB stage. Justify your answer.
d. Calculate how many clock cycles will take for execution of this sepuent on the pipeline with forwarding and bypassing when result of branch instruction (new PC content) is available after completion of the ID stage. Justify your answer.
Question 4 Memory Hierarchy
1. A short program loop goes through a 32 kiB array one 64-bit word at a time. performs a simple filtering operation (as described below) and stores the result in another array that is located immediately following the first array. An outer loop repeats the above operation 1000 times. The filtering operation involves adding the word under consideration with its two immediate neighbours (one immediately preceding and one immediately succeeding this word). and multiplying the sum with a random number which is a positive fraction equal to or less than
(i.e. 0.333...).
The 64-bit processor. operating at a clock frequency of 5 GHz. is pipelined. has 50 address lines. three levels of caches with a 64 B block size. Each of the LI caches has 256 sets. 4-way set-associativity. and round robin replacement policy. The L2 cache is an 8-way set-associative 512 kiB structure. whereas the L3 cache features 8 and 16-way set-associativity: L2 and L3 caches employ pseudo-LRU cache replacement policy - that requires 5 and 6 bits of overhead. respectively. Write back and write allocate strategies are used in L2 and L3 caches. but simpler write hit and miss policies are used with Ll cache_ The virtual address contains 54 bits plus 10 bits for security and PID!ASN. page size is 64 kiB and each of the page table caches contains SO entries. Miss penalties for L1. L2 and L3 caches are 20. 40 and 100 cc. respec lively.
a. What is the size (in bytes) of each TLB? Include overhead bits in your calculation.
b. Compute the numbers of index. tag and block offset bits in each cache.
c. Write the RISC-V assembly code to implement the problem described in this question. Recall that RISC-V refers to a 64-bit word as a double word. Assume that x25 comes up with a random number between 1 and 1000 every time it is read. Also. assume that register x5 holds the address of the first byte of the source array. Recall that the size of the displacement field in the instruction is limited. Ignore the edge effect.
d. Explain the steps required for the processor to fetch and execute the second store instruction that saves the second element of the source array in the very first iteration of the inner loop of this code (note: this is certainly not the first instruction in the program) Assume that main memory always contains the needed information, whether any level of cache also has this or not.
e. Calculate the number of accesses to TLBs. every cache. and main memory when this program executes. Calculate the number of misses and miss rate in each of the storage structures when this program is executed.
f. Calculate the time taken to execute this program in milliseconds. Hint: it is easier to use the number of misses and not miss rates.
g. Every processor. regardless of its word size. has a byte addressible memory. Why?
h. What is memory interleaving? How does this speed up main memory access?
i. Explain the sequence of actions taken upon a TLB miss. In particular explain how the page table is accessed using the virtual page number without requiring a very large decoder (using the numbers in this example).
Attachment:- Task 1.rar