This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2015/02/18 19:22] rachata |
buzzword [2015/02/23 19:19] rachata |
||
---|---|---|---|
Line 561: | Line 561: | ||
* Stride is one in MMX | * Stride is one in MMX | ||
* Intel SSE --> Modern version of MMX | * Intel SSE --> Modern version of MMX | ||
+ | |||
+ | ===== Lecture 15 (2/20 Fri.) ===== | ||
+ | * GPU | ||
+ | * Warp/Wavefront | ||
+ | * A bunch of threads sharing the same PC | ||
+ | * SIMT | ||
+ | * Lanes | ||
+ | * FGMT + massively parallel | ||
+ | * Tolerate long latency | ||
+ | * Warp based SIMD vs. traditional SIMD | ||
+ | * SPMD (Programming model) | ||
+ | * Single program operates on multiple data | ||
+ | * can have synchronization point | ||
+ | * Many scientific applications are programmed in this manner | ||
+ | * Control flow problem (branch divergence) | ||
+ | * Masking (in a branch, mask threads that should not execute that path) | ||
+ | * Lower SIMD efficiency | ||
+ | * What if you have layers of branches? | ||
+ | * Dynamic wrap formation | ||
+ | * Combining threads from different warps to increase SIMD utilization | ||
+ | * This can cause memory divergence | ||
+ | * VLIW | ||
+ | * Wide fetch | ||
+ | * IA-64 | ||
+ | * Tradeoffs | ||
+ | * Simple hardware (no dynamic scheduling, no dependency checking within VLIW) | ||
+ | * A lot of loads at the compiler level | ||
+ | * Decoupled access/execute | ||
+ | * Limited form of OoO | ||
+ | * Tradeoffs | ||
+ | * How to street the instruction (determine dependency/stalling)? | ||
+ | * Instruction scheduling techniques (static vs. dynamic) | ||
+ | * Systoric arrays | ||
+ | * Processing elements transform data in chains | ||
+ | * Develop for image processing (for example, convolution) | ||
+ | * Stage processing | ||
+ | | ||
+ | |||
+ | |||
+ | ===== Lecture 16 (2/23 Mon.) ===== | ||
+ | * Systoric arrays | ||
+ | * Processing elements transform data in chains | ||
+ | * Can be arrays of multi-dimensional processing elements | ||
+ | * Develop for image processing (for example, convolution) | ||
+ | * Can be use to break stages in pipeline programs, using a set of queues and processing elements | ||
+ | * Can enable high concurrency and good for regular programs | ||
+ | * Very special purpose | ||
+ | * The warp computer | ||
+ | * Static instruction scheduling | ||
+ | * How do we find the next instruction to execute? | ||
+ | * Live-in and live-out | ||
+ | * Basic blocks | ||
+ | * Rearranging instructions in the basic block | ||
+ | * Code movement from one basic block to another | ||
+ | * Straight line code | ||
+ | * Independent instructions | ||
+ | * How to identify independent instructions | ||
+ | * Atomicity | ||
+ | * Trace scheduling | ||
+ | * Side entrance | ||
+ | * Fixed up code | ||
+ | * How scheudling is done | ||
+ | * Instruction scheduling | ||
+ | * Prioritization heuristics | ||
+ | * Superblock | ||
+ | * Traces with no side-entrance | ||
+ | * Hyperblock | ||
+ | * BS-ISA | ||
+ | * Tradeoffs betwwen trace cache/Hyperblock/Superblock/BS-ISA | ||
+ | | ||
+ | | ||
+ | |||