User Tools

Site Tools


buzzword

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
buzzword [2015/02/18 19:22]
rachata
buzzword [2015/02/25 21:05]
kevincha [Lecture 16 (2/23 Mon.)]
Line 562: Line 562:
     * Intel SSE --> Modern version of MMX     * Intel SSE --> Modern version of MMX
  
 +===== Lecture 15 (2/20 Fri.) =====
 +  * GPU
 +    * Warp/​Wavefront
 +      * A bunch of threads sharing the same PC
 +    * SIMT
 +    * Lanes
 +    * FGMT + massively parallel
 +      * Tolerate long latency
 +    * Warp based SIMD vs. traditional SIMD
 +  * SPMD (Programming model)
 +    * Single program operates on multiple data
 +      * can have synchronization point
 +    * Many scientific applications are programmed in this manner
 +  * Control flow problem (branch divergence)
 +    * Masking (in a branch, mask threads that should not execute that path)
 +    * Lower SIMD efficiency
 +    * What if you have layers of branches?
 +  * Dynamic wrap formation
 +    * Combining threads from different warps to increase SIMD utilization
 +    * This can cause memory divergence
 +  * VLIW
 +    * Wide fetch
 +    * IA-64
 +    * Tradeoffs
 +      * Simple hardware (no dynamic scheduling, no dependency checking within VLIW)
 +      * A lot of loads at the compiler level
 +  * Decoupled access/​execute
 +    * Limited form of OoO
 +    * Tradeoffs
 +    * How to street the instruction (determine dependency/​stalling)?​
 +    * Instruction scheduling techniques (static vs. dynamic)
 +  * Systoric arrays
 +    * Processing elements transform data in chains
 +    * Develop for image processing (for example, convolution)
 +  * Stage processing
 +    ​
 +
 +
 +===== Lecture 16 (2/23 Mon.) =====
 +  * Systoric arrays
 +    * Processing elements transform data in chains
 +    * Can be arrays of multi-dimensional processing elements
 +    * Develop for image processing (for example, convolution)
 +    * Can be use to break stages in pipeline programs, using a set of queues and processing elements
 +    * Can enable high concurrency and good for regular programs
 +    * Very special purpose
 +    * The warp computer
 +  * Static instruction scheduling
 +    * How do we find the next instruction to execute?
 +  * Live-in and live-out
 +  * Basic blocks
 +    * Rearranging instructions in the basic block
 +    * Code movement from one basic block to another
 +  * Straight line code
 +  * Independent instructions
 +    * How to identify independent instructions
 +  * Atomicity
 +  * Trace scheduling
 +    * Side entrance
 +    * Fixed up code
 +    * How scheudling is done
 +  * Instruction scheduling
 +    * Prioritization heuristics
 +  * Superblock
 +    * Traces with no side-entrance
 +  * Hyperblock
 +  * BS-ISA
 +  * Tradeoffs betwwen trace cache/​Hyperblock/​Superblock/​BS-ISA
 +    ​
 +===== Lecture 17 (2/25 Wed.) =====
 +  * IA-64
 +    * EPIC
 +  * IA-64 instruction bundle
 +    * Multiple instructions in the bundle along with the template bit
 +    * Template bits
 +    * Stop bits
 +    * Non-faulting loads and exception propagation
 +  * Aggressive ST-LD reordering
 +  * Phyiscal memory system
 +  * Ideal pipelines
 +  * Ideal cache
 +    * More capacity
 +    * Fast
 +    * Cheap
 +    * High bandwidth
 +  * DRAM cell
 +    * Cheap
 +    * Sense the purturbation through sense amplifier
 +    * Slow and leaky
 +  * SRAM cell (Cross coupled inverter)
 +    * Expensice
 +    * Fast (easier to sense the value in the cell)
 +  * Memory bank
 +    * Read access sequence
 +    * DRAM: Activate -> Read -> Precharge (if needed)
 +    * What dominate the access laatency for DRAM and SRAM
 +  * Scaling issue
 +    * Hard to scale the scale to be small
 +  * Memory hierarchy
 +    * Prefetching
 +    * Caching
 +  * Spatial and temporal locality
 +    * Cache can exploit these
 +    * Recently used data is likely to be accessed
 +    * Nearby data is likely to be accessed
 +  * Caching in a pipeline design
 +  * Cache management
 +    * Manual
 +      * Data movement is managed manually
 +        * Embedded processor
 +        * GPU scratchpad
 +    * Automatic
 +      * HW manage data movements
 +  * Latency analysis
 +    * Based on the hit and miss status, next level access time (if miss), and the current level access time
 +  * Cache basics
 +    * Set/block (line)/​Placement/​replacement/​direct mapped vs. associative cache/etc.
 +  * Cache access
 +    * How to access tag and data (in parallel vs serially)
 +    * How do tag and index get used?
 +    * Modern processors perform serial access for higher level cache (L3 for example) to save power
 +  * Cost and benefit of having more associativity
 +    * Given the associativity,​ which block should be replace if it is full
 +    * Replacement poligy
 +      * Random
 +      * Least recently used (LRU)
 +      * Least frequently used
 +      * Least costly to refetch
 +      * etc.
 +  * How to implement LRU
 +    * How to keep track of access ordering
 +      * Complexity increases rapidly
 +    * Approximate LRU
 +      * Victim and next Victim policy
buzzword.txt ยท Last modified: 2015/04/27 18:20 by rachata