Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/02/18 19:22]
rachata
+++ buzzword [2015/02/25 21:05]
kevincha [Lecture 16 (2/23 Mon.)]
@@ Line 562: / Line 562: @@
     * Intel SSE --> Modern version of MMX
+===== Lecture 15 (2/20 Fri.) =====
+  * GPU
+    * Warp/Wavefront
+      * A bunch of threads sharing the same PC
+    * SIMT
+    * Lanes
+    * FGMT + massively parallel
+      * Tolerate long latency
+    * Warp based SIMD vs. traditional SIMD
+  * SPMD (Programming model)
+    * Single program operates on multiple data
+      * can have synchronization point
+    * Many scientific applications are programmed in this manner
+  * Control flow problem (branch divergence)
+    * Masking (in a branch, mask threads that should not execute that path)
+    * Lower SIMD efficiency
+    * What if you have layers of branches?
+  * Dynamic wrap formation
+    * Combining threads from different warps to increase SIMD utilization
+    * This can cause memory divergence
+  * VLIW
+    * Wide fetch
+    * IA-64
+    * Tradeoffs
+      * Simple hardware (no dynamic scheduling, no dependency checking within VLIW)
+      * A lot of loads at the compiler level
+  * Decoupled access/execute
+    * Limited form of OoO
+    * Tradeoffs
+    * How to street the instruction (determine dependency/stalling)?
+    * Instruction scheduling techniques (static vs. dynamic)
+  * Systoric arrays
+    * Processing elements transform data in chains
+    * Develop for image processing (for example, convolution)
+  * Stage processing
+===== Lecture 16 (2/23 Mon.) =====
+  * Systoric arrays
+    * Processing elements transform data in chains
+    * Can be arrays of multi-dimensional processing elements
+    * Develop for image processing (for example, convolution)
+    * Can be use to break stages in pipeline programs, using a set of queues and processing elements
+    * Can enable high concurrency and good for regular programs
+    * Very special purpose
+    * The warp computer
+  * Static instruction scheduling
+    * How do we find the next instruction to execute?
+  * Live-in and live-out
+  * Basic blocks
+    * Rearranging instructions in the basic block
+    * Code movement from one basic block to another
+  * Straight line code
+  * Independent instructions
+    * How to identify independent instructions
+  * Atomicity
+  * Trace scheduling
+    * Side entrance
+    * Fixed up code
+    * How scheudling is done
+  * Instruction scheduling
+    * Prioritization heuristics
+  * Superblock
+    * Traces with no side-entrance
+  * Hyperblock
+  * BS-ISA
+  * Tradeoffs betwwen trace cache/Hyperblock/Superblock/BS-ISA
+===== Lecture 17 (2/25 Wed.) =====
+  * IA-64
+    * EPIC
+  * IA-64 instruction bundle
+    * Multiple instructions in the bundle along with the template bit
+    * Template bits
+    * Stop bits
+    * Non-faulting loads and exception propagation
+  * Aggressive ST-LD reordering
+  * Phyiscal memory system
+  * Ideal pipelines
+  * Ideal cache
+    * More capacity
+    * Fast
+    * Cheap
+    * High bandwidth
+  * DRAM cell
+    * Cheap
+    * Sense the purturbation through sense amplifier
+    * Slow and leaky
+  * SRAM cell (Cross coupled inverter)
+    * Expensice
+    * Fast (easier to sense the value in the cell)
+  * Memory bank
+    * Read access sequence
+    * DRAM: Activate -> Read -> Precharge (if needed)
+    * What dominate the access laatency for DRAM and SRAM
+  * Scaling issue
+    * Hard to scale the scale to be small
+  * Memory hierarchy
+    * Prefetching
+    * Caching
+  * Spatial and temporal locality
+    * Cache can exploit these
+    * Recently used data is likely to be accessed
+    * Nearby data is likely to be accessed
+  * Caching in a pipeline design
+  * Cache management
+    * Manual
+      * Data movement is managed manually
+        * Embedded processor
+        * GPU scratchpad
+    * Automatic
+      * HW manage data movements
+  * Latency analysis
+    * Based on the hit and miss status, next level access time (if miss), and the current level access time
+  * Cache basics
+    * Set/block (line)/Placement/replacement/direct mapped vs. associative cache/etc.
+  * Cache access
+    * How to access tag and data (in parallel vs serially)
+    * How do tag and index get used?
+    * Modern processors perform serial access for higher level cache (L3 for example) to save power
+  * Cost and benefit of having more associativity
+    * Given the associativity, which block should be replace if it is full
+    * Replacement poligy
+      * Random
+      * Least recently used (LRU)
+      * Least frequently used
+      * Least costly to refetch
+      * etc.
+  * How to implement LRU
+    * How to keep track of access ordering
+      * Complexity increases rapidly
+    * Approximate LRU
+      * Victim and next Victim policy

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools