This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
buzzword [2014/04/14 18:16] rachata |
buzzword [2014/04/16 18:22] rachata |
||
---|---|---|---|
Line 1154: | Line 1154: | ||
* Synchronization | * Synchronization | ||
* Consistency | * Consistency | ||
+ | | ||
+ | ===== Lecture 29 (4/16 Wed.) ===== | ||
| | ||
- | | + | |
+ | |||
+ | * Ordering of instructions | ||
+ | * Maintaining memory consistency when there are multiple threads and shared memory | ||
+ | * Need to ensure the semantic is not changed | ||
+ | * Making sire the shared data is properly locked when used | ||
+ | * Support mutual exclusion | ||
+ | * Ordering depends on when each processor is executed | ||
+ | * Debugging is also difficult (non-deterministic behavior) | ||
+ | * Weak consistency: global ordering when sync | ||
+ | * programmer hints where the synchronizations are | ||
+ | * Total store order model: global ordering only with store | ||
+ | * Cache coherence | ||
+ | * Can be done in the software level or hardware level | ||
+ | * Coherence protocol | ||
+ | * Need to ensure that all the processors see and update the correct state of the cache block | ||
+ | * Need to make sure that writes get propagated and serialized | ||
+ | * Simple protocol are not scalable (one point of synchrnization) | ||
+ | * Update vs. invalidate | ||
+ | * For invalidate, only the core that needs to read retains the correct copy | ||
+ | * Can lead to ping-ponging (tons of read/writes from several processors) | ||
+ | * For updates, bus becomes the bottleneck | ||
+ | * Snoopy bus | ||
+ | * Bus based, single point of serialization | ||
+ | * More efficient with small number of processors | ||
+ | * All cache snoop other caches read/write requests to keep the cache block coherent | ||
+ | * Directory based | ||
+ | * Single point of serialization per block | ||
+ | * Directory coordinate the coherency | ||
+ | * More scalable | ||
+ | * The directory keeps track of where the copies of each block resides | ||
+ | * Supply data on a read | ||
+ | * Invalide the block on a write | ||
+ | * Has an exclusive state | ||
+ | * MSI coherent protocol | ||
+ | * Slide number 56-57 | ||
+ | * Consume bus bandwidth (need an "exclusive" state | ||
+ | * MESI coherent protocal | ||
+ | * Add the exclusive state: this is the only cache copy and it is clean state to MSI | ||
+ | * Tradeoffs between snooping and directory based | ||
+ | * Slide 71 has a good summary on this | ||
+ | * MOESI | ||
+ | * Improvement over MESI protocol | ||
+ | |||
+ | |