This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2014/04/16 18:22] rachata |
buzzword [2014/04/30 18:12] rachata |
||
---|---|---|---|
Line 1200: | Line 1200: | ||
* MOESI | * MOESI | ||
* Improvement over MESI protocol | * Improvement over MESI protocol | ||
+ | |||
| | ||
+ | ===== Lecture 29 (4/18 Wed.) ===== | ||
+ | |||
+ | |||
+ | |||
+ | * Interference | ||
+ | * Complexity of the memory scheduler | ||
+ | * Ranking/prioritization has cost | ||
+ | * Complex scheduler has higher latency | ||
+ | * Performance metric for multicore/multithead applications | ||
+ | * Speedup | ||
+ | * Slowdown | ||
+ | * Harmonic vs wrighted | ||
+ | * Fairness mertic | ||
+ | * Maximum slowdown | ||
+ | * Why does it make sense | ||
+ | * Any scenario that it does not make sense? | ||
+ | * Predictable performance | ||
+ | * Why is it important? | ||
+ | * In server environment, different jobs are on the same server | ||
+ | * In a mobile environment, there are multiple sources that can slowdown other sources | ||
+ | * How to relate slowdown with request service rate | ||
+ | * MISE: soft slowdown guarantee | ||
+ | * BDI | ||
+ | * Memory wall | ||
+ | * What is the concern regarding the memory wall | ||
+ | * Size of the cache on the die (CPU die) | ||
+ | * One possible solution: cache compression | ||
+ | * What is the problems of existing cache compression mechanism | ||
+ | * Some are too complex | ||
+ | * Decompression is in the critical path | ||
+ | * Need to decompress when reading the data -> decompression should not be in the critical path | ||
+ | * Important factor to the performance | ||
+ | * Software compression is not good enough to compress everything | ||
+ | * Zero value compression | ||
+ | * Simple | ||
+ | * Good compression ratio | ||
+ | * What is data does not have many zeroes | ||
+ | * Frequent value compression | ||
+ | * Some data appear fequently | ||
+ | * Simple and good compression ratio | ||
+ | * have to profile | ||
+ | * decompression is complex | ||
+ | * Frequent pattern compression | ||
+ | * Still to complex in terms of decompression | ||
+ | * Based delta compression | ||
+ | * Easy to decompress but retain the benefit of compression | ||
+ | | ||
| | ||
+ | ===== Lecture 31 (4/28 Mon.) ===== | ||
+ | |||
+ | * Directory based cache coherent | ||
+ | * Each directory has to handle validate/invalidation | ||
+ | * Extra cost of syncronization | ||
+ | * Need to ensure race conditions are resolved | ||
+ | * Interconnection | ||
+ | * Topology | ||
+ | * Bus | ||
+ | * Mesh | ||
+ | * Torus | ||
+ | * Tree | ||
+ | * Butterfly | ||
+ | * Ring | ||
+ | * Bi-directional ring | ||
+ | * More scalable | ||
+ | * Hierarchical ring | ||
+ | * Even more scalable | ||
+ | * More complex | ||
+ | * Crossbar | ||
+ | * etc. | ||
+ | * Circuit switching | ||
+ | * Multistage network | ||
+ | * Butterfly | ||
+ | * Delta network | ||
+ | * Handling contention | ||
+ | * Buffering vs. dropping/deflection (no buffering) | ||
+ | * Routing algorithm | ||
+ | * Handling deadlock | ||
+ | * X-Y routing | ||
+ | * Turn model (to avoid deadlocks) | ||
+ | * Add more buffering for an escape path | ||
+ | * Oblivious routing | ||
+ | * Can take different path | ||
+ | * DOR between each intermediate location | ||
+ | * Balance network load | ||
+ | * Adaptive routing | ||
+ | * Use the state of the network to determine the route | ||
+ | * Aware of local and/or global congestions | ||
+ | * Non minimal adaptive routing can have livelocks | ||
+ | |||
+ | ===== Lecture 32 (4/30 Wed.) ===== | ||
+ | |||
+ | |||
+ | * Serialized code section | ||
+ | * Degrade performance | ||
+ | * Waste energy | ||
+ | * Heterogeneous cores | ||
+ | * Can execute serialized portion on a powerful large core | ||
+ | * Tradeoff between multiple small cores, multiple large cores or heterogenerous cores | ||
+ | * Critical section | ||
+ | * bottleneck in several multithreaded workloads | ||
+ | * Assymmetry can help | ||
+ | * Accelerated critical section | ||
+ | * Use a large core to run serialized portion of the code | ||
+ | * How to correctly support ACS | ||
+ | * False serialization | ||
+ | * Handling private/shared data | ||
+ | * BIS | ||
+ | * Ideltify the bottleneck | ||
+ | * Serial bottleneck | ||
+ | * Barrier | ||
+ | * Critical section | ||
+ | * Pipeline stages | ||
+ | * Application might wait on different types of bottlenecks | ||
+ | * Allow bottleneckcall and bottleneckreturn | ||
+ | * Acceleration can be done in multiple ways | ||
+ | * ship to a big core | ||
+ | * increase the frequency | ||
+ | * Priorize the thread in share resources (memory scheduler always schedule reqeusts from the thread first, etc.) | ||
+ | * Bottleneck table keeps track of different thread's bottleneck and determine the criticality | ||
+ | | ||
+ | | ||
+ | |