# Homework 5

## CMSC 411 / Olano, Fall 2015

Consider a processor with 64-bit words and the following cache stats when running the SPEC CPU 2006 benchmarks (numbers inspired by this hit time source and this miss rate source):

Level Data Size Associativity Block size Cache Blocks Address Bits Hit Time Miss Rate Hit Rate Miss Penalty
Tag bits Index bits Offset bits
L1 Inst 32 KiB 4-way 32 B         3 cycles 0.4%
L1 Data 32 KiB 8-way 32 B         4 cycles 9.5%
L2 256 KiB 8-way 64 B         10 cycles 4%
L3 8 MiB 16-way 64 B         40 cycles 1%
Memory 200 cycles

### Other Stats (25 points)

Fill in the missing columns in the table.

### Average Memory Access Time (25 points)

1. What is the average memory access time for instructions?
2. What is the average memory access time for data?

1. Give the index, tag, and offset for instruction address 0x0000000100a1af30 at each of L1, L2 and L3.
2. Give the index, tag, and offset for data address 0x00007fff5f1e5ad8 at each of L1, L2 and L3.

### Cache Size (25 points)

Assume a pseudo-LRU replacement policy at all cache levels using one recent bit per block, set if that block has been recently accessed. What are the true sizes of all four caches, including tag and data as well as the valid, dirty and recent bits? Remember that the L1 instruction cache does not need a dirty bit, but all other cache levels do.

### Cache Simulation (25 points extra credit)

Use C++ to simulate the data cache behavior for this code for a matrix/vector multiply:

double r[1000], v[1000], m[1000][1000];

// initialize
for(int i=0; i != 1000; ++i) {
r[i] = 0.0;
v[i] = drand48();              // needs #include<stdlib.h>
for(int j=0; j != 1000; ++j) {
m[j][i] = drand48();
}
}

// multiply
for(int i=0; i != 1000; ++i) {
for(int j=0; j != 1000; ++j) {
r[j] = r[j] + m[j][i]*v[i];
}
}

You can simulate this (or any) cache for the matrix multiply portion of this code by adding something like this:

Cache cache;
// multiply
for(int i=0; i != 1000; ++i) {
for(int j=0; j != 1000; ++j) {
cache.write(&r[j]);
r[j] = r[j] + m[j][i]*v[i];
}
}

Assume a cache with the size and associativity as given at the beginning fo this homework, using a write-allocate and write-back policy for all levels.

Assume replacement using a recent-bit pseudo-LRU method. Each access sets the recent bit for that block to true. If all of the other blocks in the set also have their recent bit set to true, clear the recent bit on those other blocks to false (leaving just the current block set). On replacement, choose the first block in the set with a false for its recent bit.

The Cache class should look something like this. Note that it's only necessary to track the cache meta data, as the actual data is still handled by the C++ code, so the CacheLine struct does not need to contain the actual data (as the real cache would), just the tag and status bits.

class Cache {
// meta-data for each cache line
struct CacheLine {
uint64_t tag;
bool valid;
bool dirty;
bool recent;
};
CacheLine L1[L1_INDICES][L1_ASSOCIATIVITY];
CacheLine L2[L2_INDICES][L2_ASSOCIATIVITY];
CacheLine L3[L3_INDICES][L3_ASSOCIATIVITY];

// cache statistics
int L1accesses, L1hits, L1cycles;
int L2accesses, L2hits, L2cycles;
int L3accesses, L3hits, L3cycles;
int memAccesses, memCycles;

public:
Cache();             // initialize