What is the maximum speedup obtainable on the program

Assignment Help Computer Engineering
Reference no: EM131218167

Throughout, be sure to type your homework; you may want to use the electronic version of the homework as you can then just insert your answers where they need to be. Show your work on the problems so you can get partial credit for incorrect answers (and tips from the grader about where you went wrong). Writing your units in the calculations will help ensure you wind up with a correct value. When done with a problem, ask yourself: "Does the answer make sense?" Please remember that the homework in here requires some time to complete-it is best if you work on it several times between now and the due date.

1. Three processors P1, P2, and P3 have the same ISA. Their clock rates and average CPI are as follows:

 

P1

P2

P3

Clock Rate

3 GHz

2.5 GHz

4.0 GHz

Average CPI

1.5

1.0

2.2

a. Which processor has the highest performance?

b. How much faster is the fastest processor than the slowest processor?

c. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions for each processor.

 

P1

P2

P3

Number of cycles:

 

 

 

Number of instructions:

 

 

 

d. For processor P1, we want to reduce the execution time of the reference program from 10 seconds to 7 seconds, but the change leads to an increase of 20% in the CPI. What clock rate is required to reach the target reduction in execution time?

2. The following table shows the number of instructions for programs A and B on a processor:

 

Arith

Store

Load

Branch

Total

Prog A

650

100

600

50

1400

Prog B

750

250

500

500

2000

Assume that arithmetic instructions take 1 clock cycle, load and store each take 5 cycles, and branches require 2 cycles.

a. What is the execution time of each program in a 2 GHz processor?

b. Find the average CPI for each program.

c. If the number of load instructions can be reduced by one half (thus also reducing the total number of instructions), calculate the new average CPI of each respective program over its original version?

d. What is the speedup of each revised program in the previous problem compared with the original versions?

3. Consider two implementations of the same ISA, P1 and P2, with five classes of instructions (A through E) in the instruction set. P1 has a 4 GHz clock rate, while P2 has a 6 GHz clock rate. The average number of cycles for each instruction class are in the following table:

 

CPI Classes:

 

A

B

C

D

E

P1

1

2

3

4

5

P2

3

3

3

5

5

a. Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence (ie: the instructions can be chosen to maximize performance). What are the peak performances of P1 and P2 expressed in instructions per second?

b. If the number of instructions in a program has equal numbers of all instruction classes except for class A, which occurs twice as often as the others, how much faster is P2 than P1?

c. Using the same instruction mix, what clock frequency for P1 will give it the same performance as P2?

4. Consider a CPU manufacturing process using 300mm wafers.

a. If the dimensions of the die are 1.2cm x 0.8cm, what is the approximate number of dies produced?

b. Assuming a defect density of 0.55/cm2, what is the die yield?

c. If the wafer costs $300, what is the price of a die?

d. If the chip price is 22% more than the cost per die, what is the chip price?

5. Consider a program comprising 62% arithmetic instructions, 16% load instruction, 8% store instructions, and 14% branch instructions. Assume the CPI for all instructions is 2, except for branches which have a CPI of 3.

a. Suppose we consider two alternate improvements to a processor. P1 will execute arithmetic instructions twice as fast. P2 will execute both load and store instructions 3 times faster. In each case, other instructions are unaffected by the changes. Which is faster, P1 or P2, and by how much?

b. Consider running the program on a machine with a large graphics card. When we run the program on this machine, the arithmetic instructions only can be run in parallel on the card, everything else is run sequentially. As the number of stream processors on the GPU goes toward infinity, what is the maximum speedup obtainable on this program?

c. In the previous problem, the GPU improvement was applied toward arithmetic instructions. Assuming that you could apply the GPU to any one instruction class, this is still the smartest choice. What "Great Idea in Computer Architecture" is this an example of?

d. Assuming we can apply the GPU improvement to both load and store instructions. With infinitely many streaming GPU processors, is it possible to make the program run two times faster? Explain.

6. Compilers can dramatically affect performance of some applications. Assume compiler A results in a dynamic instruction count of 1.0E9 with an execution time of 1.1 seconds. Compiler B results in a dynamic IC of 1.2E9 and an execution time of 1.5 seconds.

a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns.

b. Assume P1 executes compilers A's code, and P2 executes compiler B's code, and the execution times are the same. How much faster is the clock on P1 versus P2?

c. A third compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. Calculate the speedup using this new compiler relative to both compiler A and compiler B.

7. Consider a reference program. Running on processor P1, with a clock rate of 4 GHz and an average CPI of 0.9, it requires execution of 5.0E9 instructions. Running on processor P2, with a clock rate of 3 GHz and an average CPI of 0.75, it requires execution of 1.0E9 instructions to complete.

a. One fallacy is that the fastest clock rate equates to the best performance. Calculate the execution times. Is the chip with the fastest clock the one with the best performance?

b. Another fallacy is to use MIPS (millions of instructions per second) to compare performance of two processors. Calculate the MIPS for each chip. Does the chip with the larger MIPS on this program also the faster one?

c. Another fallacy is to use MFLOPS (millions of floating-point operations) as a performance metric. MFLOPS is calculated as:

MFLOPS=(# floating pt operations)/(execution time · 10^6 )

d. Assume 35% of the instructions on both processors are floating-point operations. Find the MFLOPS figures for both programs. Is the chip with the highest MFLOPS also the fastest on this program?

8. Suppose we run a program on various parallel architectures. 75% of the program can be parallelized. What is the speedup for 2, 8, 32 and 128 processor machines? What is the maximum speedup achievable on this program?

9. Computationally hard problems are an important part of computer science. In such problems, the execution time of a solution program often grows exponentially as a size of the input. If I need to be able to substantially increase the size of the input I can handle on such a problem in order to solve a problem instance of interest, why is parallelizing the code likely to have little effect?

Reference no: EM131218167

Questions Cloud

New technologies have emerged : How have ICT architectures changed over time as new technologies have emerged?
Visit a museum or gallery exhibition : Visit a museum or gallery exhibition or attend a theater, dance, or musical performance.- Write a report that describes your experience.
Setup an information policy for policy : Would information policy help the company to preventing unintentionally security attack from the employees?, if you were the decision maker of a company, would you like to setup an information policy for your policy?
Provide sufficient information in the summary : Do not paste the article abstract in the paper. Annotations must be your summary of the article. The summary must include the findings of research that was included in the article.
What is the maximum speedup obtainable on the program : What is the speedup of each revised program in the previous problem compared with the original versions? what is the maximum speedup obtainable on this program?
Derive an analytical expression of flux density : Compare the obtained computational results with infinite surface charge approximation - Derive an analytical expression of flux density - Compare the computational results with the analytical solution.
Case analysis-research in motion : Case Analysis - Research in Motion - RIM Research the history of RIM, specifically with an eye toward product development. Your research should lead to answers to the following questions.
Page paper on compilers and interpreters : There are Java compilers and interpreters. Write a 3 page paper on compilers and interpreters (2-3 paragraphs on each topic). Your paper must include the following:
Important to understand visitors : Explain why it is important to understand visitors to your site who are not your customers and visitors who do not return.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd