Compare Branch Predictors

All parts of this question deal with the prediction of the BNEZ instruction corresponding to the "j!=0" branch in this loop

int i=100;
do {
	int j=100;
	do {
		body;
		j = j-1;
	} while (j != 0);
	i = i-1;
} while (i != 0);

(0,1) predictor

What is the total number of mispredictions for the j!=0 branch with a 1-bit local predictor, assuming an initial prediction of not taken?

(0,2) predictor

What is the total number of mispredictions for the j!=0 branch with a 2-bit local predictor, assuming an initial state of weakly not taken?

(1,2) predictor

How many mispredictions for the j!=0 branch with a 2-bit predictor with one bit of global branch history, assuming no other branches in the loop body and the last-executed branch before the first iteration of this code was not taken? Assume any prediction is initialized to weakly not taken.

Loop predictor

How many mispredictions for the j!=0 branch with a loop counting predictor that initially predicts taken?

Tournament

Assume a 2-bit tournament predictor deciding between the 2-bit local and loop counting predictors, initially set to weakly choose the 2-bit local predictor. When will it switch to the loop-counting predictor? What is the total number of mispredictions?

Pentium Branch Prediction

The Intel Pentium 4 had a 4K entry branch target buffer (BTB) for branch prediction, tracking the direction of up to 4096 branches, plus a 512 entry trace cache BTB (the trace cache is an L1 instruction cache holding instructions already decoded into their corresponding microcode). They claim these reduced the overall misprediction rate by 1/3 as compared to the Pentium III's. The Pentium 4's misprediction penalty was 19 cycles at 2.4 GHz, while the Pentium III penalty was 9 cycles at 1.4GHz.

On a given workload, 20% of the instructions are branches, which the Pentium III predicts correctly 90% of the time.

Pentium III Branch Cost

What is slowdown as compared to the ideal pipeline for the Pentium III?

Pentium 4 prediction rate

Given this data, what was the branch prediction rate for the Pentium 4?

Pentium 4 Branch Cost

What is the slowdown as compared the ideal pipeline for the Pentium 4?

Ideal Speedup

Considering only the improvement in clock speed (and not the effects of branch prediction or any other improvements), what is the ideal speedup expected for the Pentium 4 over the Pentium III? Note that the difference in pipeline depths is already reflected in the change in clock speeds, so should not appear in your answer.

Speedup with Branch Prediction

What is the overall speedup considering both the change in clock rate and the effect of branch prediction and branch penalties (but not considering other architectural improvements) of the Pentium 4 as compared to the Pentium III?