Snooping Shared Memory

The ARM Cortex A57 uses a modification of the ESI protocol we cover in class, called the MOESI protocol. This adds "Modified" and "Owned" states that allow data to be shared directly from one core's L1 cache to another core, without having to go through the L2 cache. If the data in a block in the Owned state is dirty, it is written back to L2 when the Owned block becomes Invalid (from eviction due to cache conflict, or write request from another core).

Here’s a summary of the states.

Draw a state diagram for this protocol. To avoid too much state transition spaghetti, I suggest arranging the states in a circle. Including state transitions for CR=This core's CPU requests a Read, CW=This core's CPU requests a Write, BR=see a Read on the Bus from another core, and BW=see a Write on the Bus from another core. Include on each transition what, if anything, the core sends on the bus: W = send Write notification, R = send Read notification, or D = send Data.

Directory Shared Memory

The Origin2000 was a series of ccNUMA directory protocol shared memory machines created by SGI/Cray in the late 1990’s. The Origin2000 architecture could support systems with up to 1024 processors. The network organization for this system gave fastest access within pairs of nodes, then sets of four, then eight, then 16, etc.

The Origin2000 used a 32-bit MIPS R10000 processor with a 32KB 2-way associative L1 cache with 64-byte cache blocks. The L2 cache was 4MB of external SRAM, 2-way associative with 128-byte cache blocks . At the processor speed (a whopping 195MHz), cache and memory latencies for this system were as follows:

Level nsclocks
L1 cache 5.1 1
L2 cache 56.4 11
local memory310 61
4P remote memory540 106
8P remote memory707 138
16P remote memory726 142
32P remote memory773 151
64P remote memory876 169
128P remote memory945 185

Average Memory Access Time

Write an expression for the average memory access time on a 8 processor (8P) machine, assuming appropriate miss rates as variables

Directory Protocol

They used a more complex protocol than the 3-state one we covered in class, but for the purposes of this question, assume they did use that 3-state protocol. If the memory access is not local, and another processor has the exclusive copy, what sequence of messages is necessary for a read?

Transfer of Ownership

The TLB entry for each virtual page includes which node is the remote owner for that page. The Origin2000 included the ability to migrate blocks from one owner to another to improve access locality and reduce directory traffic. To support this, they added a new directory state, poisoned. When a block migrated, the former owner marked the block as poisoned in its directory. If a request came in to the former owner for a block in the poisoned state, it could only mean some processor had an out-of-date TLB entry for the block. The former owner would return a response to invalidate the TLB entry and retry, forcing the local system to refresh that TLB entry and get the correct new owner from the OS. Write a possible sequence of CPU reads, page faults, directory protocol messages, and TLB updates that could implement this for a read to a shared block where the processor does not have a local copy of the block, but does have an outdated entry in its TLB.