User Tools

Site Tools


buzzword

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
buzzword [2015/04/13 16:08]
kevincha
buzzword [2015/04/27 18:20]
rachata
Line 1281: Line 1281:
         * Has an exclusive state         * Has an exclusive state
  
-===== Lecture 29 (4/13 Mon.) =====+===== Lecture 29 (4/10 Fri.) =====
   * MSI coherent protocol   * MSI coherent protocol
     * The problem: unnecessary broadcasts of invalidations     * The problem: unnecessary broadcasts of invalidations
Line 1295: Line 1295:
   * Scalability of directories   * Scalability of directories
     * Size? Latency? Thousand of nodes? Best of both snooping and directory?     * Size? Latency? Thousand of nodes? Best of both snooping and directory?
 +
 +    ​
 +===== Lecture 30 (4/13 Mon.) =====
 +  * In-memory computing
 +  * Design goals of DRAM
 +  * DRAM structures
 +    * Banks
 +    * Capacitors and sense amplifiers
 +    * Trade-offs b/w number of sense amps and cells
 +    * Width of bank I/O vs. row size
 +  * DRAM operations
 +    * ACTIVATE, READ/WRITE, and PRECHARGE
 +  * Trade-offs
 +    * Latency
 +    * Bandwidth: Chip vs. rank vs. bank
 +      * What's the benefit of having 8 chips?
 +    * Parallelism
 +  * RowClone
 +    * What are the problems?
 +    * Copying b/w two rows that share the same sense amplifier
 +    * System software support
 +  * Bitwise AND/OR
 +
 +===== Lecture 31 (4/15 Wed.) =====
 +
 +  * Application slowdown
 +  * Interference between different applications
 +    * Applications'​ performance depends on other applications that they are running with
 +  * Predictable performance
 +    * Why are they important?
 +    * Applications that need predictibility
 +    * How to predict the performance?​
 +      * What information are useful?
 +      * What need to be guarantee?
 +      * How to estimate the performance when running with others?
 +        * Easy, just measure the performance while it is running.
 +      * How to estimate the performance when the application is running by itself.
 +        * Hard if there is no profiling.
 +      * The relationship between memory service rate and the performance.
 +        * Key assumption: applications are memory bound
 +    * Behavior of memory-bound applications
 +      * With and without interference
 +  * Memory phase vs. compute phase
 +  * MISE
 +    * Estimating slowdown using request service rate
 +    * Inaccuracy when measuring request service rate alone
 +    * Non-memory-bound applications
 +    * Control slowdown and provide soft guarantee
 +  * Taking into account of the shared cache
 +    * MISE model + cache resource management
 +    * Aug tag store
 +      * Separate tag store for different cores
 +    * Cache access rate alone and shared as the metric to estimate slowdown
 +  * Cache paritiioning
 +    * How to determine partitioning
 +      * Utility based cache partitioning
 +      * Others
 +  * Maximum slowdown and fairness metric
 +    ​
 +
 +
 +===== Lecture 32 (4/20 Mon.) =====
 +
 +  * Heterogeneous systems
 +    * Asymmetric cores: different types of cores on the chip
 +      * Each of these cores are optimized for different workloads/​requirements/​goals
 +      * Multiple special purpose processors
 +      * Flexible and can adapt to workload behavior
 +      * Disadvantages:​ complex and high overhead
 +    * Examples: CPU-GPU systems, heterogeneity in execution models
 +    * Heterogeneous resources
 +      * Example: reliable and non-reliable DRAM in the same system
 +  * Key problems in modern systems
 +    * Memory system
 +    * Efficiency
 +    * Predictability
 +    * Assymmetric design can help solving these problems
 +  * Serialized code sections
 +    * Bottleneck in multicore execution
 +    * Parallelizable vs. serial portion
 +    * Accelerate critical section
 +    * Cache ping-ponging
 +      * Synchronization latency
 +    * Symmetric vs. assymmetric design
 +  * Large cores + small cores
 +    * Core assymmetry
 +  * Amdahl'​s law with heterogeneous cores
 +  * Parallel bottlenecks
 +    * Resource contention
 +      * Depends on what are running
 +  * Accelerated critical section
 +    * Ship critical sections to large cores
 +    * Small modifications and low overhead
 +    * False serialization might become the bottleneck
 +    * Can reduce parallel throughput
 +    * Effect on private cache misses and shared cache misses
 +    ​
 +  ​
 +===== Lecture 33 (4/27 Mon.) =====
 +
 +  * Interconnects
 +    * Connecting multiple components together
 +    * Goal: Scalability,​ flexibility,​ performance and energy efficiency
 +  * Metric: Performance,​ bandwidth, bisection bandwidth, cost, energy efficienct, system performance,​ contention, latency
 +    * Saturation point
 +      * Saturation throughput
 +  * Topology
 +    * How to wire components together, affects routing, throughput, latency
 +    * Bus: All nodes connected to a single ring
 +      * Hard to increase frequency, bandwidth, poor scalability but simple
 +    * Point-to-point
 +      * Low contention and potentially low latency. Costly, not scalable and hard to wire.
 +    * Crossbar
 +      * No contention. Concurrent request from different src/dest can be sent concurrently. Costly.
 +    * Multistage logarithmic network
 +      * Indirect network, low contention, multiple request can be sent concurrently. More scalable compared to crossbar.
 +      * Circuit switch
 +      * Omega network, delta network.
 +      * Butterfly network
 +      * Intermediate switch between sources and destinations
 +    * Switching vs. topology
 +    * Ring
 +      * Each node connected to two other nodes, forming a ring
 +      * Low overhead, high latency, not as scalable.
 +      * Unidirectional ring and bi-directional ring
 +    * Hierarchical Rings
 +      * Layers of rings. More scalable, lower latency.
 +      * Bridge router connect multiple rings together
 +    * Mesh
 +      * 4 input and output ports
 +      * More bisection bandwidth and more scalable
 +      * Easy to layout
 +      * Path diversity
 +      * Routers are more complex
 +    * Tree
 +      * Another hierarchical topology
 +      * Specialized topology
 +      * Good for local traffic
 +      * Fat tree: higher level have more bandwidth
 +      * CM-5 Fat tree
 +        * Fat tree with 4x2 switches
 +    * Hypercube
 +      * N-Dimensional cubes
 +      * Caltech cosmic cube
 +      * Very complex
 +  * Routing algorithm
 +    * How does message get sent from source to destination
 +    * Static or adaptive
 +    * Handling contention
 +      * Buffering helps handling contention, but adds complexity
 +    * Three types of routing algorithms
 +      * Deterministic:​ always takes the same path
 +      * Oblivious: takes different paths without taking into account of the state of the network
 +        * For example, Valiant algorithm
 +      * Adaptive: takes different paths taking into account of the state of the network
 +        * Non-minimal adaptive routing vs. minimal adaptive routing
 +      * Minimal path: path that has minimum number of hops
 +  * Buffering and flow control
 +    * How to store within the network
 +    * Handling oversubscription
 +    * Source throttling
 +    * Bufferless vs. buffered crossbars
 +    * Buffer overflow
 +    * Bufferless deflection routing
 +      * Deflect packets when there is contention
 +      * Hot-potato routing
 +
 +
 +    ​
buzzword.txt ยท Last modified: 2015/04/27 18:20 by rachata