User Tools

Site Tools


buzzword

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
buzzword [2015/04/06 19:10]
kevincha
buzzword [2015/04/20 18:16]
rachata
Line 1210: Line 1210:
       * Can be done explicitly (parallel programming by the programmer)       * Can be done explicitly (parallel programming by the programmer)
       * Or implicitly (hardware partitions a single thread speculatively)       * Or implicitly (hardware partitions a single thread speculatively)
-    * Or, run multiple independent tasks (still improves throughput, but the +    * Or, run multiple independent tasks (still improves throughput, but the speedup of any single tasks is not better, also simpler to implement)
-      ​speedup of any single tasks is not better, also simpler to implement)+
   * Loosely coupled multiprocessor   * Loosely coupled multiprocessor
     * No shared global address space     * No shared global address space
Line 1240: Line 1239:
     * Synchronization     * Synchronization
     * Consistency     * Consistency
 +
 +
 +===== Lecture 28 (4/8 Wed.) =====
 +  * Ordering of instructions
 +    * Maintaining memory consistency when there are multiple threads and shared memory
 +    * Need to ensure the semantic is not changed
 +    * Making sure the shared data is properly locked when used
 +      * Support mutual exclusion
 +    * Ordering depends on when each processor is executed
 +    * Debugging is also difficult (non-deterministic behavior)
 +  * Dekker'​s algorithm
 +    * Inconsistency -- the two processors did NOT see the same order of operations to memory
 +  * Sequential consistency
 +    * Multiple correct global orders
 +    * Two issues:
 +        * Too conservative/​strict
 +        * Performance limiting
 +  * Weak consistency:​ global ordering when sync
 +    * programmer hints where the synchronizations are
 +    * Memory fence
 +    * More burden on the programmers
 +  * Cache coherence
 +    * Can be done in the software level or hardware level
 +  * Snoop-based coherence
 +    * A simple protocol with two states by broadcasting reads/​writes on a bus
 +  * Maintaining coherence
 +    * Needs to provide 1) write propagation and 2) write serialization
 +    * Update vs. Invalidate
 +  * Two cache coherence methods
 +    * Snoopy bus
 +      * Bus based, single point of serialization
 +      * More efficient with small number of processors
 +      * Processors snoop other caches read/write requests to keep the cache block coherent
 +    * Directory
 +      * Single point of serialization per block
 +      * Directory coordinates the coherency
 +      * More scalable
 +      * The directory keeps track of where the copies of each block resides
 +        * Supplies data on a read
 +        * Invalidates the block on a write
 +        * Has an exclusive state
 +
 +===== Lecture 29 (4/10 Fri.) =====
 +  * MSI coherent protocol
 +    * The problem: unnecessary broadcasts of invalidations
 +  * MESI coherent protocol
 +    * Add the exclusive state: this is the only cache copy and it is a clean state to MSI
 +    * Multiple invalidation tradeoffs
 +    * Problem: memory can be unnecessarily updated
 +    * A possible owner state (MOESI)
 +  * Tradeoffs between snooping and directory based coherence protocols
 +    * Slide 31 has a good summary
 +  * Directory: data structures
 +    * Bit vectors vs. linked lists
 +  * Scalability of directories
 +    * Size? Latency? Thousand of nodes? Best of both snooping and directory?
 +
 +    ​
 +===== Lecture 30 (4/13 Mon.) =====
 +  * In-memory computing
 +  * Design goals of DRAM
 +  * DRAM structures
 +    * Banks
 +    * Capacitors and sense amplifiers
 +    * Trade-offs b/w number of sense amps and cells
 +    * Width of bank I/O vs. row size
 +  * DRAM operations
 +    * ACTIVATE, READ/WRITE, and PRECHARGE
 +  * Trade-offs
 +    * Latency
 +    * Bandwidth: Chip vs. rank vs. bank
 +      * What's the benefit of having 8 chips?
 +    * Parallelism
 +  * RowClone
 +    * What are the problems?
 +    * Copying b/w two rows that share the same sense amplifier
 +    * System software support
 +  * Bitwise AND/OR
 +
 +===== Lecture 31 (4/15 Wed.) =====
 +
 +  * Application slowdown
 +  * Interference between different applications
 +    * Applications'​ performance depends on other applications that they are running with
 +  * Predictable performance
 +    * Why are they important?
 +    * Applications that need predictibility
 +    * How to predict the performance?​
 +      * What information are useful?
 +      * What need to be guarantee?
 +      * How to estimate the performance when running with others?
 +        * Easy, just measure the performance while it is running.
 +      * How to estimate the performance when the application is running by itself.
 +        * Hard if there is no profiling.
 +      * The relationship between memory service rate and the performance.
 +        * Key assumption: applications are memory bound
 +    * Behavior of memory-bound applications
 +      * With and without interference
 +  * Memory phase vs. compute phase
 +  * MISE
 +    * Estimating slowdown using request service rate
 +    * Inaccuracy when measuring request service rate alone
 +    * Non-memory-bound applications
 +    * Control slowdown and provide soft guarantee
 +  * Taking into account of the shared cache
 +    * MISE model + cache resource management
 +    * Aug tag store
 +      * Separate tag store for different cores
 +    * Cache access rate alone and shared as the metric to estimate slowdown
 +  * Cache paritiioning
 +    * How to determine partitioning
 +      * Utility based cache partitioning
 +      * Others
 +  * Maximum slowdown and fairness metric
 +    ​
 +
 +
 +===== Lecture 32 (4/20 Mon.) =====
 +
 +  * Heterogeneous systems
 +    * Assymmetric cores: different types of cores on the chip
 +      * Each of these cores are optimized for different workloads/​requirements/​goals
 +      * Multiple special purpose processors
 +      * Flexible and can adapt to workload behavior
 +      * Disadvantages:​ complex and high overhead
 +    * Examples: CPU-GPU systems, heterogeneity in execution models
 +    * Heterogeneous resources
 +      * Example: reliable and non-reliable DRAM in the same system
 +  * Key problems in modern systems
 +    * Memory system
 +    * Efficiency
 +    * Predictability
 +    * Assymmetric design can help solving these problems
 +  * Serialized code sections
 +    * Bottleneck in multicore execution
 +    * Parallelizable vs. serial portion
 +    * Accelerate critical section
 +    * Cache ping-ponging
 +      * Synchronization latency
 +    * Symmetric vs. assymmetric design
 +  * Large cores + small cores
 +    * Core assymmetry
 +  * Amdahl'​s law with heterogeneous cores
 +  * Parallel bottlenecks
 +    * Resource contention
 +      * Depends on what are running
 +  * Accelerated critical section
 +    * Ship critical sections to large cores
 +    * Small modifications and low overhead
 +    * False serialization might become the bottleneck
 +    * Can reduce parallel throughput
 +    * Effect on private cache misses and shared cache misses
 +    ​
 +  ​
 +
 +    ​
 +    ​
buzzword.txt · Last modified: 2015/04/27 18:20 by rachata