Question 1

Assume benchmarks show that 12.5% of the instructions executed by a specific MIPS program use an arithmetic immediate. Also, 14% of these need 16 bits, 10% need 15 bits, and 7% need 14 bits. The SPARC processor has only a 13-bit immediate for its register/immediate instruciton format. Assuming clock speed and CPI were the same, what is the expected slowdown of the SPARC due to the smaller immediate value?

Question 2

You are considering adding a multiply-add instruction:

MAD $rd, $ra, $rb  ; $rd = $rd + $ra*$rb

If multiplies make up 10% of the instructions executed a particular program, and 5% of the total instructions are adds with an accompanying multiply that could be fused into a single instance of the new instruction, what is the expected speedup?

Question 3

Consider this sequence of MIPS instructions

        LW   $r1, #0($r4)
        ADD  $r1, $r1, $r2
        SUBI $r3, $r1, #1
        BNEZ $r3, target
        ADDI $r1, $r1, #-1
target: SW   $r1, #0($r4)

Assuming the standard 5-stage MIPS pipeline with zero check in the ID stage, show a pipeline timing diagram for the case when there is a branch, and for the case when there is not. Draw an arrow between stages for any forwarded result, and circle the stage when the branch target and direction are known.

Question 4

You are considering adding a new "branch on not equal" immediate instruction that could replace the subtract and BNEZ. So

SUBI $r3, $r1, #1
BNEZ $r3, target

becomes

BNEQI $r1, #1, target

Assuming the branch resolution could only be determined in the EX stage, show new pipeline timing diagrams for the sequence in Question 3 with the SUBI/BNEZ replaced with the new instruction.

Question 5

This new instruction uses one register and two immediates. To support it, you'd need a new instruction format:

opcode (6b) Rs (5b) Immediate (?) Offset (?)

To figure out how to allocate the remaining 21 bits between immediate and offset, you could instrument a program to know the percentage of instructions that are BNEQI (call that B), and the percentage of immediate and offset values in the new BNEQI that need n bits. From these measurements, you get a series of percentages, so I1 is the percentage needing 1 bit, I2 is the percentage needing 2 bits, etc. You could also find the distribution of offsets, so O1 is the percentage needing 1 bit, O2 is the percentage needing 2 bits, etc. Give the equation for the expected speedup for any given choice of bit allocation, assuming you can fall back to SUBI/BNEZ if either the immediate or offset is too big.