Question:
(a) Describe the following terminologies:
i. Branch
ii. Branch Prediction
iii. Branch Predictor
iv. Branch Misprediction
(b) Consider that 15% of instructions are loads and that 20% of the instructions following a load depend on its results and are stalled for 1 cycle. All instructions and all loads hit in their respective first-level caches. Consider further that 20% of instructions are branches, with 60% of them being taken and 40% being not taken. The penalty is 2 cycles if the branch is not taken, and it is 3 cycles if the branch is taken. Then, 1 cycle is lost for 20% of the loads, 2 cycles are lost when a conditional branch is not taken, and 3 cycles are lost for taken branches.
(i) Determine the CPI load latency, CPI branches, CPI, and IPC.
(ii) A very simple optimization implementation for branches is to consider that they are not taken. There will be no penalty if indeed the branch is not taken, and there will still be a 3 cycle penalty if it is taken. Calculate the CPI branches, CPI, and IPC.
(iii) Assuming that a branch-not-taken strategy has been implemented, plot CPI vs. branch misprediction cost when the latter varies between 3 and 20 cycles.
(iv) Do your computations in (iii) argue for sophisticated branch predictors when the pipelines become "deeper"?
(c) In (b), we assumed that the cache miss penalty was 20 cycles. With modern processors running at a frequency of 1 to 3 GHz, the cache miss penalty can reach several hundred cycles.
(i) Keeping all other parameters the same as in (b), plot CPI vs. cache miss penalty cost when the latter varies between 20 and 500 cycles.
(ii) Do your computations argue for the threat of a "memory wall" whereby loading instructions and data could potentially dominate the execution time?