This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2014/04/14 18:16] rachata |
buzzword [2014/04/18 18:19] rachata |
||
---|---|---|---|
Line 1154: | Line 1154: | ||
* Synchronization | * Synchronization | ||
* Consistency | * Consistency | ||
+ | | ||
+ | ===== Lecture 29 (4/16 Wed.) ===== | ||
| | ||
- | | + | |
+ | |||
+ | * Ordering of instructions | ||
+ | * Maintaining memory consistency when there are multiple threads and shared memory | ||
+ | * Need to ensure the semantic is not changed | ||
+ | * Making sire the shared data is properly locked when used | ||
+ | * Support mutual exclusion | ||
+ | * Ordering depends on when each processor is executed | ||
+ | * Debugging is also difficult (non-deterministic behavior) | ||
+ | * Weak consistency: global ordering when sync | ||
+ | * programmer hints where the synchronizations are | ||
+ | * Total store order model: global ordering only with store | ||
+ | * Cache coherence | ||
+ | * Can be done in the software level or hardware level | ||
+ | * Coherence protocol | ||
+ | * Need to ensure that all the processors see and update the correct state of the cache block | ||
+ | * Need to make sure that writes get propagated and serialized | ||
+ | * Simple protocol are not scalable (one point of synchrnization) | ||
+ | * Update vs. invalidate | ||
+ | * For invalidate, only the core that needs to read retains the correct copy | ||
+ | * Can lead to ping-ponging (tons of read/writes from several processors) | ||
+ | * For updates, bus becomes the bottleneck | ||
+ | * Snoopy bus | ||
+ | * Bus based, single point of serialization | ||
+ | * More efficient with small number of processors | ||
+ | * All cache snoop other caches read/write requests to keep the cache block coherent | ||
+ | * Directory based | ||
+ | * Single point of serialization per block | ||
+ | * Directory coordinate the coherency | ||
+ | * More scalable | ||
+ | * The directory keeps track of where the copies of each block resides | ||
+ | * Supply data on a read | ||
+ | * Invalide the block on a write | ||
+ | * Has an exclusive state | ||
+ | * MSI coherent protocol | ||
+ | * Slide number 56-57 | ||
+ | * Consume bus bandwidth (need an "exclusive" state | ||
+ | * MESI coherent protocal | ||
+ | * Add the exclusive state: this is the only cache copy and it is clean state to MSI | ||
+ | * Tradeoffs between snooping and directory based | ||
+ | * Slide 71 has a good summary on this | ||
+ | * MOESI | ||
+ | * Improvement over MESI protocol | ||
+ | |||
+ | |||
+ | ===== Lecture 29 (4/18 Wed.) ===== | ||
+ | |||
+ | |||
+ | |||
+ | * Interference | ||
+ | * Complexity of the memory scheduler | ||
+ | * Ranking/prioritization has cost | ||
+ | * Complex scheduler has higher latency | ||
+ | * Performance metric for multicore/multithead applications | ||
+ | * Speedup | ||
+ | * Slowdown | ||
+ | * Harmonic vs wrighted | ||
+ | * Fairness mertic | ||
+ | * Maximum slowdown | ||
+ | * Why does it make sense | ||
+ | * Any scenario that it does not make sense? | ||
+ | * Predictable performance | ||
+ | * Why is it important? | ||
+ | * In server environment, different jobs are on the same server | ||
+ | * In a mobile environment, there are multiple sources that can slowdown other sources | ||
+ | * How to relate slowdown with request service rate | ||
+ | * MISE: soft slowdown guarantee | ||
+ | * BDI | ||
+ | * Memory wall | ||
+ | * What is the concern regarding the memory wall | ||
+ | * Size of the cache on the die (CPU die) | ||
+ | * One possible solution: cache compression | ||
+ | * What is the problems of existing cache compression mechanism | ||
+ | * Some are too complex | ||
+ | * Decompression is in the critical path | ||
+ | * Need to decompress when reading the data -> decompression should not be in the critical path | ||
+ | * Important factor to the performance | ||
+ | * Software compression is not good enough to compress everything | ||
+ | * Zero value compression | ||
+ | * Simple | ||
+ | * Good compression ratio | ||
+ | * What is data does not have many zeroes | ||
+ | * Frequent value compression | ||
+ | * Some data appear fequently | ||
+ | * Simple and good compression ratio | ||
+ | * have to profile | ||
+ | * decompression is complex | ||
+ | * Frequent pattern compression | ||
+ | * Still to complex in terms of decompression | ||
+ | * Based delta compression | ||
+ | * Easy to decompress but retain the benefit of compression | ||
+ | |||
+ | |||
+ |