Reference no: EM132380517
EE4363/CSCI4203 Computer Architecture and Machine Organization Assignment, University of Minnesota, USA
1. [IDEAL (DENNARDIAN) SCALING]
Meet your classmates: Tim FlimFlam, the chief architect of theMN-4363 processor, and Doug TickTock, one of the architects of MN-4363. Do you agree with the following claims of Tim and Doug? Provide a brief justification (of at most 2-3 sentences).
(a) Tim claims that, according to Dennardian scaling, switching speed remains constant over technology generations, and the area per switch increases.
(b) Tim adds that Dennardian scaling assumes a constant power consumption over technology generations: Both, static and dynamic power per switch remains constant.
(c) Tim believes that if the functionality (number of switches) of a processor remained constant over two consecutive generations, G and G+1, the dynamic power consumption of the processor would decrease - from PDYN(G) to PDYN(G + 1), and the dynamic power density (dynamic power per unit area) would increase.
(d) Tim believes that his observations from (c) apply also to static power.
(e) Tim's claims confuse Doug. Doug thought that under the assumptions for (c), the processor should have provided the same functionality at less area, while consuming less power. Do you agree with Doug? Explain.
(f) Doug adds that, under Dennardian scaling, if processor area was kept constant over consecutive generations (instead of functionality as explored in (c)-(d)), we could harvest more functionality in the same area, while consuming the same power. Do you agree with Doug? Explain.
2. [PRACTICAL SCALING (ARTIFACTS)]
(a) How does practical scaling deviate from Dennardian scaling? Consider the implications for the speed, power consumption, and area per switch. Be precise.
(b)Why did practical scaling deviate from Dennardian scaling? Be precise.
(c) What is dark silicon? Be precise.
(d) Your classmates have been thinking hard about how to address dark silicon. Do you think their solutions might work? Justify your answer (using no more than 2-3 sentences):
(i) In his garage, Doug TickTock developed a novel cooling framework. He claims that according to his most recent findings, his framework will be able to remove all heat generated by the increasing power per area over technology generations. In other words, his framework is able to keep up with the pace power density is increasing over technology generations. According to him, if we adapt his framework, there won't be any dark silicon. Hint: Power is dissipated as heat.
(ii) Tim FlimFlam claims that anything "dark" should be cut-off - i.e. the processor area should shrink to exclude dark silicon. This is because there is no way to translate the extra switches (to remain dark) into more performance or functionality, he claims.
(iii) In his garage, Doug TickTock developed yet another technique. He invented switches which do not leak - zero static power. According to him, once we "switch" to his switches, there won't be any dark silicon.
3. [QUANTIFYING "BETTER"]
Do you agree with the following claims of your classmates Tim and Doug? Provide a brief justification (of at most 2-3 sentences).
(a) Timclaims that by decreasing operating voltage V, both static and dynamic power can reduce, while operating at the same speed.
(b) Tim states that there is no point in trying to decrease the static power, since static power does not go to computation anyway. Hence, decreasing static power cannot have any power or performance implication for computation. Hint: Each processor is subject to a power budget.
(c) Tim claims that the CPI is a constant specific to each processor design. Note: Provide at least 3 reasons to address why or why not.
(d) To model total energy consumption of a program, Doug TickTock suggests multiplying the total number of instructions with an estimate of energy per instruction, EPI. He claims that EPI and CPI are strongly correlated.
(e) [Bonus] Tim claims that decreasing the operating voltage V would increase the operating speed, since switching delay is proportional to CV/I.
(f) [Bonus] Doug claims that decreasing the operating voltage V would increase the operating speed, since due to Q=CV, changing state (i.e. charging or discharging capacitive storage elements) would become faster.
4. [QUANTITATIVE ANALYSIS]
Your class-mate, Doug TickTock's study on usage of high-level programming language constructs suggests that function calls are one of the most expensive operations. He came up with an ISA which reduces the loads and stores induced by function calls and returns. The first thing he did was run some experiments with and without this optimization. His experiments deployed the same state-of-the-art optimizing compiler that will be used with either version of the computer. These experiments revealed the following information: (i) The clock cycle time of the optimized version is 5% lower than the unoptimized version. (ii) Thirty percent of the instructions in the unoptimized version are loads or stores. (iii) The optimized version executes two-thirds as many loads and stores as the unoptimized version. For all other instructions the dynamic execution counts are unchanged. (iv) Every instruction (including load and store) in the unoptimized version takes one clock cycle. (v) Due to the optimization, the procedure call and return instructions take one extra cycle in the optimized version, and these instructions account for 5% of total instruction count in the optimized version.
Is Doug's optimization working? Justify your answer.
5. [AMDAHL'S LAW]
(a) As the chief architect of Massively Parallel Inc., you hired a summer intern called Gene Amdahl. He was in charge of characterizing the speed-up of one of your Massively Parallel programs due to parallelization. Accordingly, he identified the speed-up (with respect to the non-parallelized version, i.e. the version running on a single processor) as a function of the number of processors, N, allocated for the execution of the program.
Assumptions: The overall execution time of the program for N=1 is s+p, where p corresponds to the section of code that can be parallelized. On N processors, this section would take p/N < timeunits >. s+ p = 1 < timeunit >. s does not change with N.
Plot the speedup as a function of p for N=1024. What should p be to achieve a speedup ≈ 512 x?
(b) Following the notation from Slides 29-32 of (second part of) the introduction, derive the equation for the maximum speed-up that your intern could observe, as a function of s0, the speed-up of the optimized portion, and f0, the fraction that can be optimized, by using limT2'→0 (in class, we used lAZXims_0→∞ instead). How do T, T1, T2, T2', f0, s0 relate to s, p, and N from (a)?
NOTES - Please submit an electronic copy in pdf format. Any dangling answer without justification deserves 0 points at most.