### Programming Speedup

You have a program that spends 54 minutes on one core computation using a brute-force algorithm. You are considering replacing that with a smarter algorithm that will reduce the core computation time to 6 minutes, but you'll add 34 minutes of data structure traversal time.

#### Speedup

What is the speedup, accounting for both computation and data structure time?

#### Amdahl's Law

If the core computation time of the original program is 95% of the total execution time, compute the overall speedup using Amdahl's law. Identify the fraction enhanced and fraction unenhanced.

### Instruction Set Speedup

Assume load operations are 20% of the total instructions for your workload and stores are 10%. Your load and store instructions can use a register+offset to compute the address

`LOAD R1, offset(R2) ; Register[R1] = Memory[ Register[R2] + offset ]`

You are considering removing these displacement load and store instructions from the instruction set, instead requiring two instructions for the 60% of loads and stores when the offset is not 0.

```
ADD R1, R2, offset ; Register[R1] = Register[R2] + offset
LOAD R1, R1 ; Register[R1] = Memory[R1]
```

#### Average CPI

If the CPI of load and stores before the change is 5, while other instructions have a CPI of 4, what is the average CPI?

#### Clock speedup

What is the expected speedup from this change, if it will allow a 1.2x improvement in clock speed, with no change in CPI?

#### CPI speedup

What is the expected speedup from this change if it allows you to reduce the CPI for load and store instructions to 4 cycles per instruction, but you get no change to the clock cycle time?

### Complex CPI

do { r = r + a[i] * b[i]; ++i; } while (--c != 0);

This code can be translated into this code for the Intel 8088 (yes the one from 1991), assuming r is in the BX register, c is in the CX register, i is in the SI register, and

```
loop: MOV AX, a[SI] ; AX = a[SI]
IMUL b[SI] ; AX = AX * b[SI]
ADD BX, AX ; r = r + AX
ADD SI, 2 ; SI = SI + 2 (for 2-byte data): this is the ++i
DEC CX ; CX = CX - 1: this is the --c
JNZ loop ; this is the while test
```

Using the timing data for this processor, compute the **CPI** for 100 iterations of this code. You will need to use the "Effective Address (EA)" timing table as well as the individual instruction tables. Given a 10 MHz clock speed, what is the **expected execution time** of this code?