User Tools

Site Tools


buzzword

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
buzzword [2015/02/20 19:25]
rachata
buzzword [2015/03/17 02:29]
kevincha
Line 45: Line 45:
   * Memory wall (a part of scaling issue)   * Memory wall (a part of scaling issue)
   * Scaling issue   * Scaling issue
-    * Transister ​are getting smaller+    * Transistor ​are getting smaller
   * Key components of a computer   * Key components of a computer
   * Design points   * Design points
Line 54: Line 54:
   * Reliability problems that cause errors   * Reliability problems that cause errors
   * Analogies from Kuhn's "The Structure of Scientific Revolutions"​ (Recommended book)   * Analogies from Kuhn's "The Structure of Scientific Revolutions"​ (Recommended book)
-    * Pre paradigm science+    * Pre-paradigm science
     * Normal science     * Normal science
-    * Revolutionalry ​science+    * Revolutionary ​science
   * Components of a computer   * Components of a computer
     * Computation     * Computation
Line 76: Line 76:
     * Operands     * Operands
     * Live-outs/​Live-ins     * Live-outs/​Live-ins
-      * DIfferent ​types of data flow nodes (conditional/​relational/​barrier)+      * Different ​types of data flow nodes (conditional/​relational/​barrier)
     * How to do transactional transaction in dataflow?     * How to do transactional transaction in dataflow?
       * Example: bank transactions        * Example: bank transactions
Line 121: Line 121:
    * Tradeoffs between 0,1,2,3 address machines    * Tradeoffs between 0,1,2,3 address machines
    * Postfix notation    * Postfix notation
-   * Instructions/​Opcode/​Operade ​specifiers (i.e. addressing modes)+   * Instructions/​Opcode/​Operand ​specifiers (i.e. addressing modes)
    * Simply vs. complex data type (and their tradeoffs)    * Simply vs. complex data type (and their tradeoffs)
    * Semantic gap and level    * Semantic gap and level
Line 236: Line 236:
   * Variable latency memory   * Variable latency memory
   * Handling interrupts   * Handling interrupts
-  * Difference ​betwen ​interrupts and exceptions+  * Difference ​between ​interrupts and exceptions
   * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)   * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
   * Updating machine behavior   * Updating machine behavior
Line 257: Line 257:
     * Idle resources     * Idle resources
   * Throughput of a pipelined design   * Throughput of a pipelined design
-    * What dictacts ​the throughput of a pipelined design?+    * What dictates ​the throughput of a pipelined design?
   * Latency of the pipelined design   * Latency of the pipelined design
   * Dependency   * Dependency
Line 336: Line 336:
     * Inst. from different thread can fill-in the bubbles     * Inst. from different thread can fill-in the bubbles
     * Cost?     * Cost?
-  * Simulteneuos ​multithreading+  * Simultaneuos ​multithreading
   * Branch prediction   * Branch prediction
     * Guess what to fetch next.     * Guess what to fetch next.
Line 345: Line 345:
       * Given the program/​number of instructions,​ percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.       * Given the program/​number of instructions,​ percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
         * How many extra instructions are being fetched?         * How many extra instructions are being fetched?
-        * What is the performance ​degredation?+        * What is the performance ​degradation?
     * How to reduce the miss penalty?     * How to reduce the miss penalty?
     * Predicting the next address (non PC+4 address)     * Predicting the next address (non PC+4 address)
Line 352: Line 352:
     * Global branch history - for directions     * Global branch history - for directions
     * Can use compiler to profile and get more info     * Can use compiler to profile and get more info
-      * Input set dictacts ​the accuracy+      * Input set dictates ​the accuracy
       * Add time to compilation       * Add time to compilation
     * Heuristics that are common and doesn'​t require profiling.     * Heuristics that are common and doesn'​t require profiling.
Line 358: Line 358:
       * Does not require profiling       * Does not require profiling
     * Static branch prediction     * Static branch prediction
-      * Pregrammer ​provides pragmas, hinting the likelihood of taken/not taken branch+      * Programmer ​provides pragmas, hinting the likelihood of taken/not taken branch
       * For example, x86 has the hint bit       * For example, x86 has the hint bit
     * Dynamic branch prediction     * Dynamic branch prediction
Line 369: Line 369:
     * Why are they very important?     * Why are they very important?
       * Differences between 99% accuracy and 98% accuracy       * Differences between 99% accuracy and 98% accuracy
-      * Cost of a misprediction when the pipeline is veryd eep+      * Cost of a misprediction when the pipeline is very deep
   * Global branch correlation   * Global branch correlation
     * Some branches are correlated     * Some branches are correlated
Line 405: Line 405:
       * Use branch prediction on an easy to predict code       * Use branch prediction on an easy to predict code
       * Use predicated execution on a hard to predict code       * Use predicated execution on a hard to predict code
-      * Compiler can be more aggressive in optimimzing ​the code+      * Compiler can be more aggressive in optmizing ​the code
     * What are the tradeoffs (slide# 47)     * What are the tradeoffs (slide# 47)
   * Multi-path execution   * Multi-path execution
Line 411: Line 411:
     * Can lead to wasted work     * Can lead to wasted work
     * VLIW     * VLIW
-    * SuperScalar+    * Superscalar
  
  
Line 434: Line 434:
   * Reorder buffer   * Reorder buffer
     * Reorder results before they are visible to the arch. state     * Reorder results before they are visible to the arch. state
-      * Need to presearve ​the sequential ​sematic ​and data +      * Need to preserve ​the sequential ​semantic ​and data 
-    * What are the informatinos ​in the ROB entry+    * What are the information ​in the ROB entry
     * Where to get the value from (forwarding path? reorder buffer?)     * Where to get the value from (forwarding path? reorder buffer?)
       * Extra logic to check where the youngest instructions/​value is       * Extra logic to check where the youngest instructions/​value is
-      * Content ​addressible ​search (CAM)+      * Content ​addressable ​search (CAM)
         * A lot of comparators         * A lot of comparators
     * Different ways to simplify the reorder buffer     * Different ways to simplify the reorder buffer
Line 450: Line 450:
       * An updated value (Speculative),​ called future file       * An updated value (Speculative),​ called future file
       * A backup value (to restore the state quickly       * A backup value (to restore the state quickly
-    * Double the cost of the regfile, but reduce the area as you don't have to use a content ​addressible ​memory (compared to ROB alone)+    * Double the cost of the regfile, but reduce the area as you don't have to use a content ​addressable ​memory (compared to ROB alone)
   * Branch misprediction resembles Exception   * Branch misprediction resembles Exception
     * The difference is that branch misprediction is not visible to the software     * The difference is that branch misprediction is not visible to the software
Line 486: Line 486:
       * Slides 28 --> register renaming       * Slides 28 --> register renaming
       * Slides 30-35 --> Exercise (also on the board)       * Slides 30-35 --> Exercise (also on the board)
-        * This will be usefull ​for the midterm+        * This will be useful ​for the midterm
     * Register aliasing table     * Register aliasing table
     * Broadcasting tags     * Broadcasting tags
Line 503: Line 503:
       * Still visible as a Von Neumann model       * Still visible as a Von Neumann model
     * Where does the efficiency come from?     * Where does the efficiency come from?
-    * Size of the scheduling ​windors/reorder buffer. Tradeoffs? What make sense?+    * Size of the scheduling ​windows/reorder buffer. Tradeoffs? What make sense?
   * Load/store handling   * Load/store handling
     * Would like to schedule them out of order, but make them visible in-order     * Would like to schedule them out of order, but make them visible in-order
Line 538: Line 538:
     * But the program needs to be very parallel     * But the program needs to be very parallel
     * Memory can be the bottleneck (due to very high MLP)     * Memory can be the bottleneck (due to very high MLP)
-    * What does the functional units look like? Deep pipelin ​and simpler control.+    * What does the functional units look like? Deep pipeline ​and simpler control.
     * CRAY-I is one of the examples of vector processor     * CRAY-I is one of the examples of vector processor
     * Memory access pattern in a vector processor     * Memory access pattern in a vector processor
Line 549: Line 549:
     * What if there are multiple memory ports?     * What if there are multiple memory ports?
     * Gather/​Scatter allows vector processor to be a lot more programmable (i.e. gather data for parallelism)     * Gather/​Scatter allows vector processor to be a lot more programmable (i.e. gather data for parallelism)
-      * Helps handling sparse ​metrices+      * Helps handling sparse ​matrices
     * Conditional operation     * Conditional operation
     * Structure of vector units     * Structure of vector units
Line 579: Line 579:
     * Lower SIMD efficiency     * Lower SIMD efficiency
     * What if you have layers of branches?     * What if you have layers of branches?
-  * Dynamic ​wrap formation+  * Dynamic ​warp formation
     * Combining threads from different warps to increase SIMD utilization     * Combining threads from different warps to increase SIMD utilization
     * This can cause memory divergence     * This can cause memory divergence
Line 593: Line 593:
     * How to street the instruction (determine dependency/​stalling)?​     * How to street the instruction (determine dependency/​stalling)?​
     * Instruction scheduling techniques (static vs. dynamic)     * Instruction scheduling techniques (static vs. dynamic)
-  * Systoric ​arrays+  * Systolic ​arrays
     * Processing elements transform data in chains     * Processing elements transform data in chains
     * Develop for image processing (for example, convolution)     * Develop for image processing (for example, convolution)
Line 600: Line 600:
  
  
 +===== Lecture 16 (2/23 Mon.) =====
 +  * Systolic arrays
 +    * Processing elements transform data in chains
 +    * Can be arrays of multi-dimensional processing elements
 +    * Develop for image processing (for example, convolution)
 +    * Can be use to break stages in pipeline programs, using a set of queues and processing elements
 +    * Can enable high concurrency and good for regular programs
 +    * Very special purpose
 +    * The warp computer
 +  * Static instruction scheduling
 +    * How do we find the next instruction to execute?
 +  * Live-in and live-out
 +  * Basic blocks
 +    * Rearranging instructions in the basic block
 +    * Code movement from one basic block to another
 +  * Straight line code
 +  * Independent instructions
 +    * How to identify independent instructions
 +  * Atomicity
 +  * Trace scheduling
 +    * Side entrance
 +    * Fixed up code
 +    * How scheduling is done
 +  * Instruction scheduling
 +    * Prioritization heuristics
 +  * Superblock
 +    * Traces with no side-entrance
 +  * Hyperblock
 +  * BS-ISA
 +  * Tradeoffs between trace cache/​Hyperblock/​Superblock/​BS-ISA
 +    ​
 +===== Lecture 17 (2/25 Wed.) =====
 +  * IA-64
 +    * EPIC
 +  * IA-64 instruction bundle
 +    * Multiple instructions in the bundle along with the template bit
 +    * Template bits
 +    * Stop bits
 +    * Non-faulting loads and exception propagation
 +  * Aggressive ST-LD reordering
 +  * Physical memory system
 +  * Ideal pipelines
 +  * Ideal cache
 +    * More capacity
 +    * Fast
 +    * Cheap
 +    * High bandwidth
 +  * DRAM cell
 +    * Cheap
 +    * Sense the perturbation through sense amplifier
 +    * Slow and leaky
 +  * SRAM cell (Cross coupled inverter)
 +    * Expensice
 +    * Fast (easier to sense the value in the cell)
 +  * Memory bank
 +    * Read access sequence
 +    * DRAM: Activate -> Read -> Precharge (if needed)
 +    * What dominate the access latency for DRAM and SRAM
 +  * Scaling issue
 +    * Hard to scale the scale to be small
 +  * Memory hierarchy
 +    * Prefetching
 +    * Caching
 +  * Spatial and temporal locality
 +    * Cache can exploit these
 +    * Recently used data is likely to be accessed
 +    * Nearby data is likely to be accessed
 +  * Caching in a pipeline design
 +  * Cache management
 +    * Manual
 +      * Data movement is managed manually
 +        * Embedded processor
 +        * GPU scratchpad
 +    * Automatic
 +      * HW manage data movements
 +  * Latency analysis
 +    * Based on the hit and miss status, next level access time (if miss), and the current level access time
 +  * Cache basics
 +    * Set/block (line)/​Placement/​replacement/​direct mapped vs. associative cache/etc.
 +  * Cache access
 +    * How to access tag and data (in parallel vs serially)
 +    * How do tag and index get used?
 +    * Modern processors perform serial access for higher level cache (L3 for example) to save power
 +  * Cost and benefit of having more associativity
 +    * Given the associativity,​ which block should be replace if it is full
 +    * Replacement policy
 +      * Random
 +      * Least recently used (LRU)
 +      * Least frequently used
 +      * Least costly to refetch
 +      * etc.
 +  * How to implement LRU
 +    * How to keep track of access ordering
 +      * Complexity increases rapidly
 +    * Approximate LRU
 +      * Victim and next Victim policy
 + 
 +===== Lecture 18 (2/27 Fri.) =====
 +  * Tag store and data store
 +  * Cache hit rate
 +  * Average memory access time (AMAT)
 +  * AMAT vs. Stall time
 +  * Cache basics
 +    * Direct mapped vs. associative cache
 +    * Set/block (line)/​Placement/​replacement
 +    * How do tag and index get used?
 +  * Full associativity
 +  * Set associative cache
 +    * insertion, promotion, eviction (replacement)
 +  * Various replacement policies
 +  * How to implement LRU
 +    * How to keep track of access ordering
 +      * Complexity increases rapidly
 +    * Approximate LRU
 +      * Victim and next Victim policy
 +  * Set thrashing
 +    * Working set is bigger than the associativity
 +  * Belady'​s OPT
 +    * Is this optimal?
 +    * Complexity?
 +  * DRAM as a cache for disk
 +  * Handling writes
 +    * Write through
 +      * Need a modified bit to make sure accesses to data got the updated data
 +    * Write back
 +      * Simpler, no consistency issues
 +  * Sectored cache
 +    * Use subblock
 +      * lower bandwidth
 +      * more complex
 +  * Instruction vs data cache
 +    * Where to place instructions
 +      * Unified vs. separated
 +    * In the first level cache
 +  * Cache access
 +    * First level access
 +    * Second level access
 +      * When to start the second level access
 +  * Cache performance
 +    * capacity
 +    * block size
 +    * associativity
 +  * Classification of cache misses
 +===== Lecture 19 (03/02 Mon.) =====
 +  * Subblocks
 +  * Victim cache
 +    * Small, but fully assoc. cache behind the actual cache
 +    * Cached misses cache block
 +    * Prevent ping-ponging
 +  * Pseudo associativity
 +    * Simpler way to implement associative cache
 +  * Skewed assoc. cache
 +    * Different hashing functions for each way
 +  * Restructure data access pattern
 +    * Order of loop traversal
 +    * Blocking
 +  * Memory level parallelism
 +    * Cost per miss of a parallel cache miss is less costly compared to serial misses
 +  * MSHR
 +    * Keep track of pending cache
 +      * Think of this as the load/store buffer-ish for cache
 +    * What information goes into the MSHR?
 +    * When do you access the MSHR?
 +  * Memory banks
 +  * Shared caches in multi-core processors
 +===== Lecture 20 (03/04 Wed.) =====
 +  * Virtual vs. physical memory
 +  * System'​s management on memory
 +    * Benefits
 +  * Problem: physical memory has limited size
 +  * Mechanisms: indirection,​ virtual addresses, and translation
 +  * Demand paging
 +  * Physical memory as a cache
 +  * Tasks of system SW for VM
 +  * Serving a page fault
 +  * Address translation
 +  * Page table
 +    * PTE (page table entry)
 +  * Page replacement algorithm
 +    * CLOCK algo.
 +    * Inverted page table
 +  * Page size trade-offs
 +  * Protection
 +  * Multi-level page tables
 +  * x86 implementation of page table
 +  * TLB
 +    * Handling misses
 +  * When to do address translation?​
 +  * Homonym and Synonyms
 +    * Homonym: Same VA but maps to different PA with multiple processes
 +    * Synonyms: Multiple VAs map to the same PA
 +      * Shared libraries, shared data, copy-on-write
 +  * Virtually indexed vs. physically indexed
 +  * Virtually tagged vs. physically tagged
 +  * Virtually indexed physically tagged
 +  * Can these create problems when we have the cache
 +  * How to eliminate these problems?
 +  * Page coloring
 +  * Interaction between cache and TLB
buzzword.txt · Last modified: 2015/04/27 18:20 by rachata