For this assignment, we will using details from the Intel Nehalem architecture, used in the Core i7 processors realeased around 2012

Nehalem Processor: L1I 4-way 32KiB with 32B blocks; L1D 8-way 32 KiB with 32B blocks; L2 8-way 256 KiB with 64B blocks; L3 16-way 8MiB with 64B blocks

Cache addressing

Address breakdown

Give the breakdown in bits (number of bits and which ones) for the index, offset, and tag of a 32-bit address for each of 1) L1 instruction cache, 2) L1 data cache, 3) L2 cache, and 4) L3 cache.

Explicit cache addressing

For the address 0xE90989B8, give the index, offset, and tag at each of 1) L1 instruction cache, 2) L1 data cache, 3) L2 cache, and 4) L3 cache.

Physical size

The sizes given for each cache are the size for just the data. Give the physical size necessary to realize each of these caches, including space for data, tag, valid bit, and dirty bit.

Cache timing

The L1 hit time is 4 cycles, L2 is 10 cycles, L3 is 41 cycles, and memory is 148 cycles.

Average Memory Access Time

The miss rates for a specific program are 2% for the L1 instruction cache, 1% for L1 data cache, 0.005% for L2, and 0.0001% for L3. What is the AMAT.

Memory stalls

The CPI is 0.35 for this program, not counting memory stalls. What is the CPI including memory stalls, if 20% of the instructions read from memory and 10% write memory.

Data transfers

Assume a program starts by attempting to read a 32-bit word of data from 0xE90989B8 that is not in any level of the cache. What are the sequence of read requests that would be issued from each level of the memory hierarchy to the next to fulfill this request. To get you started, the first step would be "read 32 bytes starting at 0xE90989A0 from L2 to L1". Some transfers will require multiple read requests since all of the data transfer paths are 32 bytes wide.