Computation Time
(25 points)

Assume multiplies take 32 cycles, branches take 2 cycles, and all other instructions take 1 cycle. At 2 GHz, how long does this program take to execute? Show the intermediate steps as well as the final number

        addi $3, $0, 0
        addi $1, $0, 5
loop:   mul  $2, $1, $1
        add  $3, $3, $2
        addi $1, $1, -1
        bnez $1, loop

Instruction Speedup
(25 points)

Array indexing operations often need take the form base + index*stride, where stride is the spacing between array elements (row size, struct size, etc.). In the MIPS instruction set, reading an array element when both base and index are in registers takes at least three instructions:

muli $offset, $index, #stride
add  $addr, $base, $offset
lw   $element, 0($offset)

Consider adding new R-type load and store instructions for strides up to 32 that uses the shift field for the stride. This form of this instruction would be:

lwa  $element, $base, $index, #stride

encoded as

op (6b) rs (5b) rt (5b) rd (5b) sh (5b) func (6b)
000000$base$index$element#stridelwa

If all instructions on this processor take one cycle (including the multiplies) and 4% are lw and sw instructions that can be replaced with the new instruction, what is the expected speedup?

Combined Speedup
(25 points)

Consider a processor where all instructions take one cycle at 500 MHz, limited by the time for the multiply operation. If you break the multiply into five cycles, you can increase the clock speed to 2 GHz. If 2% of the instructions are multiplies, what is the expected speedup?

Amdahl's Law
(25 points)

Consider the processor from the end of the previous problem (2 GHz clock rate, 2% of the instructions are multiplies, which take 5 cycles, all other instructions take one cycle). What fraction of the time is spent in multiply instructions? If you reduce the multiply to just four cycles, what is the speedup for multiply instructions? Using Amdahl's law, what is the expected overall speedup?

Extra credit
(25 points)

The CDC 6600 was a supercomputer designed in 1965 by Seymour Cray. It had 60 bit words, and each instruction was either 30 bits (1/2 word) or 15 bits (1/4 word) that could be packed with multiple instructions per word. The 30-bit instructions were not allowed to be split across a word boundary. So, you could have a 15-bit instruction, a 30-bit instruction, then a 15-bit instruction in a single word, but if there were only 15 bits left when a 30-bit instruction was needed, you would encode a 15-bit No-op so the 30-bit instruction could start in the next word. Given a mix of 60% 15-bit instructions and 40% 30-bit instructions, we want to figure out the expected percentage of no-op instructions.

Simulated solution

Write a program to simulate this by randomly inserting either A for a 15-bit instruction, BB for a 30-bit instruction, or N for a no-op into a string. From the results of this simulation, what is the expected percentage of N instructions? Commit your program to your hw3 directory and tag "assn3" (still by class time on the due date)

Analytical solution

Solve for this percentage analytically. Warning: I had to enumerate the possible cases, use conditional probabilities, and a few probability identities to do this. It is not that easy, so I recommend doing the simulation and speedup components of the extra credit first (or just doing those two).

Speedup

From your results (either simulated or analytical), what would the expected speedup be if you were able to avoid the no-op instructions?

Submitting

Submit on paper, at the beginning of class on the due date. Be sure to put your name and campus ID on your assignment when you submit it