This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2015/04/15 18:16] rachata |
buzzword [2015/04/20 18:16] rachata |
||
---|---|---|---|
Line 1281: | Line 1281: | ||
* Has an exclusive state | * Has an exclusive state | ||
- | ===== Lecture 29 (4/13 Mon.) ===== | + | ===== Lecture 29 (4/10 Fri.) ===== |
* MSI coherent protocol | * MSI coherent protocol | ||
* The problem: unnecessary broadcasts of invalidations | * The problem: unnecessary broadcasts of invalidations | ||
Line 1296: | Line 1296: | ||
* Size? Latency? Thousand of nodes? Best of both snooping and directory? | * Size? Latency? Thousand of nodes? Best of both snooping and directory? | ||
- | ===== Lecture 30 (4/15 Wed.) ===== | + | |
+ | ===== Lecture 30 (4/13 Mon.) ===== | ||
+ | * In-memory computing | ||
+ | * Design goals of DRAM | ||
+ | * DRAM structures | ||
+ | * Banks | ||
+ | * Capacitors and sense amplifiers | ||
+ | * Trade-offs b/w number of sense amps and cells | ||
+ | * Width of bank I/O vs. row size | ||
+ | * DRAM operations | ||
+ | * ACTIVATE, READ/WRITE, and PRECHARGE | ||
+ | * Trade-offs | ||
+ | * Latency | ||
+ | * Bandwidth: Chip vs. rank vs. bank | ||
+ | * What's the benefit of having 8 chips? | ||
+ | * Parallelism | ||
+ | * RowClone | ||
+ | * What are the problems? | ||
+ | * Copying b/w two rows that share the same sense amplifier | ||
+ | * System software support | ||
+ | * Bitwise AND/OR | ||
+ | |||
+ | ===== Lecture 31 (4/15 Wed.) ===== | ||
* Application slowdown | * Application slowdown | ||
Line 1332: | Line 1354: | ||
* Maximum slowdown and fairness metric | * Maximum slowdown and fairness metric | ||
| | ||
+ | |||
+ | |||
+ | ===== Lecture 32 (4/20 Mon.) ===== | ||
+ | |||
+ | * Heterogeneous systems | ||
+ | * Assymmetric cores: different types of cores on the chip | ||
+ | * Each of these cores are optimized for different workloads/requirements/goals | ||
+ | * Multiple special purpose processors | ||
+ | * Flexible and can adapt to workload behavior | ||
+ | * Disadvantages: complex and high overhead | ||
+ | * Examples: CPU-GPU systems, heterogeneity in execution models | ||
+ | * Heterogeneous resources | ||
+ | * Example: reliable and non-reliable DRAM in the same system | ||
+ | * Key problems in modern systems | ||
+ | * Memory system | ||
+ | * Efficiency | ||
+ | * Predictability | ||
+ | * Assymmetric design can help solving these problems | ||
+ | * Serialized code sections | ||
+ | * Bottleneck in multicore execution | ||
+ | * Parallelizable vs. serial portion | ||
+ | * Accelerate critical section | ||
+ | * Cache ping-ponging | ||
+ | * Synchronization latency | ||
+ | * Symmetric vs. assymmetric design | ||
+ | * Large cores + small cores | ||
+ | * Core assymmetry | ||
+ | * Amdahl's law with heterogeneous cores | ||
+ | * Parallel bottlenecks | ||
+ | * Resource contention | ||
+ | * Depends on what are running | ||
+ | * Accelerated critical section | ||
+ | * Ship critical sections to large cores | ||
+ | * Small modifications and low overhead | ||
+ | * False serialization might become the bottleneck | ||
+ | * Can reduce parallel throughput | ||
+ | * Effect on private cache misses and shared cache misses | ||
| | ||
+ | | ||
+ | | ||
+ | |