Programming Speedup

You have a program that spends 54 minutes on one core computation using a brute-force algorithm. You are considering replacing that with a smarter algorithm that will reduce the core computation time to 6 minutes, but you'll add 34 minutes of data structure traversal time.

Speedup

What is the speedup, accounting for both computation and data structure time?

Amdahl's Law

If the core computation time of the original program is 95% of the total execution time, compute the overall speedup using Amdahl's law. Identify the fraction enhanced and fraction unenhanced.

Instruction Set Speedup

Assume load operations are 20% of the total instructions for your workload and stores are 10%. Your load and store instructions can use a register+offset to compute the address

LOAD R1, offset(R2)    ; Register[R1] = Memory[ Register[R2] + offset ]

You are considering removing these displacement load and store instructions from the instruction set, instead requiring two instructions for the 60% of loads and stores when the offset is not 0.

ADD R1, R2, offset     ; Register[R1] = Register[R2] + offset
LOAD R1, R1            ; Register[R1] = Memory[R1]

Average CPI

If the CPI of load and stores before the change is 5, while other instructions have a CPI of 4, what is the average CPI?

Clock speedup

What is the expected speedup from this change, if it will allow a 1.2x improvement in clock speed, with no change in CPI?

CPI speedup

What is the expected speedup from this change if it allows you to reduce the CPI for load and store instructions to 4 cycles per instruction, but you get no change to the clock cycle time?

Complex CPI

do {
    r = r + a[i] * b[i];
    ++i;
} while (--c != 0);

This code can be translated into this code for the Intel 8088 (yes the one from 1991), assuming r is in the BX register, c is in the CX register, i is in the SI register, and

loop:  MOV   AX, a[SI]   ; AX = a[SI]
       IMUL  b[SI]       ; AX = AX * b[SI]
       ADD   BX, AX      ; r = r + AX
       ADD   SI, 2       ; SI = SI + 2 (for 2-byte data): this is the ++i
       DEC   CX          ; CX = CX - 1: this is the --c
       JNZ   loop        ; this is the while test

Using the timing data for this processor, compute the CPI for 100 iterations of this code. You will need to use the "Effective Address (EA)" timing table as well as the individual instruction tables. Given a 10 MHz clock speed, what is the expected execution time of this code?