User Tools

Site Tools


buzzword

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
buzzword [2014/02/24 19:17]
rachata
buzzword [2015/01/13 01:37]
kevincha [Lecture 1 (1/12 Mon.)]
Line 1: Line 1:
 ====== Buzzwords ====== ====== Buzzwords ======
  
-Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. ​ This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material. +Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.
- +
-===== Lecture 1 (1/13 Mon.) =====+
  
 +===== Lecture 1 (1/12 Mon.) =====
   * Level of transformation   * Level of transformation
     * Algorithm     * Algorithm
Line 10: Line 9:
     * Compiler     * Compiler
   * Cross abstraction layers   * Cross abstraction layers
-    ​Expose ​an interface+  ​Exposing ​an interface
   * Tradeoffs   * Tradeoffs
   * Caches   * Caches
Line 16: Line 15:
   * Multi-core   * Multi-core
   * Unfairness   * Unfairness
-  * DRAM controller/Memory ​controller+  * DRAM/memory ​controller
   * Memory hog   * Memory hog
   * Row buffer hit/miss   * Row buffer hit/miss
   * Row buffer locality   * Row buffer locality
-  * Streaming access/ Random ​access +  * Streaming access ​vs. random ​access
-  * DRAM refresh +
-  * Retention time +
-  * Profiling DRAM retention time+
   * Power consumption   * Power consumption
-  * Wimpy cores 
   * Bloom filter   * Bloom filter
-    * Pros/Cons 
-    * False Positive 
-  * Simulation 
   * Memory performance attacks   * Memory performance attacks
-  * RTL design +  * Hamming code 
- +  * Hamming distance 
-===== Lecture 2 (1/15 Wed.) ===== +  * Abstraction layer 
- +  * Memory ​performance ​hog 
-  * Optimizing for energy/ Optimizing for the performance +  * Shared ​DRAM memory ​system 
-    * Generally you should optimize for the users +  * Unfairness 
-  * state-of-the-art +  * DRAM banks 
-  * RTL Simulation +  * Memory ​scheduling policies 
-    * Long, slow and can be costly +  * Scheduling priority 
-  * High level simulation +  * Retention ​time of DRAM 
-    * What should be employed? +  * Process variation 
- * Important to get the idea of how they are implemented in RTL +  * Retention ​time profile 
- * Allows designer to filter out techniques that do not work well +  * DRAM row hammer
-  * Design points +
-    * Design processors to meet the design points +
-  * Software stack +
-  * Design decisions +
-  * Datacenters +
-  * MIPS R2000 +
-    * What are architectural techniques that improve the performance ​of a processor over MIPS 2000 +
-  * Moore'​s Law +
-  * in-order execution +
-  * out-of-order execution +
-  * technologies that are available on cellphones +
-  * new applications that are made available through new computer architecture techniques +
-    * more data mining (genomics/​medical areas) +
- * lower power (cellphones) +
- * smaller cores (cellphones/​computers) +
- * etc. +
-  * Performance bottlenecks in a single thread/core processors +
-    * multi-core as an alternative +
-  * Memory wall (a part of scaling issue) +
-  * Scaling issue +
-    * Transister are getting smaller +
-  * Reliability problems that cause errors +
-  * Analogies from Kuhn's "The Structure of Scientific Revolutions"​ (Recommended book) +
-    * Pre paradigm science +
-    * Normal science +
-    * Revolutionalry science +
-  * Components of a computer +
-    * Computation +
-      * Communication +
-        * Storage +
-          * DRAM +
-          * NVRAM (Non-volatile ​memory): PCM, STT-MRAM +
-          * Storage (Flash/​Harddrive) +
-  * Von Neumann Model (Control flow model) +
-    * Stored program computer +
-        * Properties of Von Neumann Model: Stored program, sequential instruction processing +
-        * Unified memory +
-          * When does an instruction is being interpreted as an instruction (as oppose to a datum)? +
-        * Program counter +
-        * Examples: x86, ARM, Alpha, IBM Power series, SPARC, MIPS +
-  * Data flow model +
-    * Data flow machine +
-      * Data flow graph +
-    * Operands +
-    * Live-outs/​Live-ins +
-      * DIfferent types of data flow nodes (conditional/​relational/​barrier) +
-    * How to do transactional transaction in dataflow? +
-      * Example: bank transactions  +
-  ​* Tradeoffs between control-driven and data-driven +
-    * What are easier to program? +
-    * Which are easy to compile? +
-    * What are more parallel (does that mean it is faster?) +
-    * Which machines are more complex to design? +
-    * In control flow, when a program is stop, there is a pointer to the current state (precise state). +
-  * ISA vs. Microarchitecture +
-    * Semantics in the ISA +
-    * uArch should obey the ISA +
-    * Changing ISA is costly, can affect compatibility. +
-  * Instruction pointers +
-  * uArch techniques: common and powerful techniques break Vonn Neumann model if done at the ISA level +
-    * Conceptual techniques +
-      * Pipelining  +
-      * Multiple instructions at a time +
-      * Out-of-order executions +
-      * etc. +
-    * Design techniques +
-      * Adder implementation (Bit serial, ripple carry, carry lookahead) +
-      * Connection machine (an example of a machine that use bit serial to tradeoff latency for more parallelism) +
-  * Microprocessor:​ ISA + uArch + circuits +
-  * What are a part of the ISA? Instructions,​ memory, etc. +
-    * Things that are visible to the programmer/​software +
-  * What are not a part of the ISA? (what goes inside: uArch techniques) +
-    * Things that are not suppose to be visible to the programmer/​software but typically make the processor faster and/or consumes less power and/or less complex +
- +
-===== Lecture 3 (1/17 Fri.) ===== +
- +
-  * Design tradeoffs +
-  * Macro Architectures +
-  * Reconfiguribility vs. specialized designs +
-  * Parallelism (instructions,​ data parallel) +
-  * Uniform decode (Example: Alpha) +
-  * Steering bits (Sub-opcode) +
-  * 0,1,2,3 address machines +
-    * Stack machine +
-    * Accumulator machine +
-    * 2-operand machine +
-    * 3-operand machine +
-    * Tradeoffs between 0,1,2,3 address machines +
-  * Instructions/​Opcode/​Operade specifiers (i.e. addressing modes) +
-  * Simply vs. complex data type (and their tradeoffs) +
-  * Semantic gap +
-  * Translation layer +
-  * Addressability +
-  * Byte/bit addressable machines +
-  * Virtual memory +
-  * Big/little endian +
-  * Benefits of having registers (data locality) +
-  * Programmer visible (Architectural) state +
-    * Programmers can access this directly +
-    * What are the benefits? +
-  * Microarchitectural state +
-    * Programmers cannot access this directly +
-  * Evolution of registers (from accumulators to registers) +
-  * Different types of instructions +
-    * Control instructions +
-    * Data instructions +
-    * Operation instructions +
-  * Addressing modes +
-    * Tradeoffs (complexity,​ flexibility,​ etc.) +
-  * Orthogonal ISA +
-    * Addressing modes that are orthogonal to instructino types +
-  * Vectors vs. non vectored interrupts +
-  * Complex vs. simple instructions +
-    * Tradeoffs +
-  * RISC vs. CISC +
-    * Tradeoff +
-    * Backward compatibility +
-    * Performance +
-    * Optimization opportunity +
- +
-===== Lecture 4 (1/22 Wed.) ===== +
- +
-  * Semantic gap +
-    * Small vs. Large semantic gap (CISC vs. RISC) +
-    * Benefit of RISC vs. CISC +
-  * Micro operations/​microcode +
-    * Translate complex instructions into smaller instructions +
-  * Parallelism (motivation for RISC) +
-  * Compiler optimization +
-  * Code optimization through translation +
-  * VLIW +
-  * Fixed vs. variable length instructions +
-    * Tradeoffs +
-      * Alignment issues? (fetch/​decode) +
-      * Decoding issues? +
-      * Code size? +
-      * Adding additional instructions?​ +
-      ​* Memory ​bandwidth and cache utilization?​ +
-      * Energy? +
-    * Encoding in variable length instructions +
-  * Structure of Alpha instructions and other uniform decode instructions +
-    * Different type of instructions +
-    * Benefit of knowing what type of instructions +
-      * Speculatively operate future instructions +
-  * x86 and other non-uniform decode instructions +
-    * Tradeoff vs. uniform decode +
-  * Tradeoffs for different number of registers +
-    * Spilling into memory if the number of registers is small +
-    * Compiler optimization on how to manage which value to keep/​spill +
-  * Addressing modes +
-    * Benefits? +
-    * Types? +
-    * Different uses of addressing modes? +
-  * Various ISA-level tradeoffs +
-  * Virtual memory +
-  * Unalign memory access/​aligned memory access +
-    * Cost vs. benefit of unaligned access +
-  * ISA specification +
-    * Things you have to obey/​specifie in the ISA specification +
-  * Architectural states +
-  * Microarchitecture implements how arch. state A transformed to the next arch. state A' +
-  * Single cycle machines +
-    * Critical path in the single cycle machine +
-  * Multi cycle machines +
-  * Functional units +
-  * Performance metrics +
-    * CPI/IPC +
-      * CPI of a single cycle microarchitecture +
- +
-===== Lecture 5 (1/24 Fri.) ===== +
- +
-  * Instruction processing +
-    * Fetch +
-    * Decode +
-    * Execute +
-    * Memory fetch +
-    * Writeback +
-  * Datapath & Control logic in microprocessors +
-  * Different types of instructions (I-type, R-type, etc.) +
-  * Control flow instructions +
-  * Non-control flow instructions +
-  * Delayed slot/​Delayed branch +
-  * Single cycle control logic +
-  * Lockstep +
-  * Critical path analysis +
-    * Critical path of a single cycle processor +
-  * Combinational logic & Sequential logic +
-  * Control store +
-  * Tradeoffs of a single cycle uarch +
-  * Dynamic power/​Static power +
-  * Speedup calculation +
-    * Parallelism +
-    * Serial bottleneck +
-    * Amdahl'​s bottleneck +
-  * Design principles +
-    * Common case design +
-    * Critical path design +
-    * Balanced designs +
-  * Multi cycle design +
- +
-===== Lecture 6 (1/27 Mon.) ===== +
- +
-  * Microcoded/​Microprogrammed machines +
-    * States +
-    * Microinstructions +
-    * Microsequencing +
-    * Control store - Product control signals +
-    * Microsequencer ​   +
-    * Control signal +
-      * What do they have to control? ​    +
-  * Instruction processing cycle +
-  * Latch signals +
-  * State machine +
-  * State variables +
-  * Condition code +
-  * Steering bits +
-  * Branch enable logic +
-  * Difference between gating and loading? (write enable vs. driving the bus) +
-  * Memory mapped I/O +
-  * Hardwired logic +
-    * What control signals come from hardwired logic? +
-  * Variable latency memory +
-  * Handling interrupts +
-  * Difference betwen interrupts and exceptions +
-  * Emulator (i.e. uCode allots minimal datapath to emulate the ISA) +
-  * Updating machine behavior +
-  * Horizontal microcode +
-  * Vertical microcode +
-  * Primitives +
- +
-===== Lecture 7 (1/29 Wed.) ===== +
- +
-  * Pipelining +
-  * Limitations of the multi-programmed design +
-    * Idle resources +
-  * Throughput of a pipelined design +
-    * What dictacts the throughput of a pipelined design? +
-  * Latency of the pipelined design +
-  * Dependency +
-  * Overhead of pipelining +
-    * Latch cost? +
-  * Data forwarding/​bypassing +
-  * What are the ideal pipeline? +
-  * External fragmentation +
-  * Issues in pipeline designs +
-    * Stalling +
-      * Dependency (Hazard) +
-        * Flow dependence +
-        * Output dependence +
-        * Anti dependence +
-        * How to handle them? +
-    * Resource contention +
-    * Keeping the pipeline full +
-    * Handling exception/​interrupts +
-    * Pipeline flush +
-    * Speculation +
-  * Interlocking +
-  * Multipath execution +
-  * Fine grain multithreading +
-  * No-op (Bubbles in the pipeline) +
-  * Valid bits in the instructions +
- +
-===== Lecture 8 (1/31 Fri.) ===== +
-  * Branch prediction +
-  * Different types of data dependence +
-  * Pipeline stalls +
-    * bubbles +
-    * How to handle stalls +
-    * Stall conditions +
-    * Stall signals +
-    * Dependences +
-      * Distant between dependences +
-    * Data forwarding/​bypassing +
-    * Maintaining the correct dataflow +
-  * Different ways to design data forwarding path/​logic +
-  * Different techniques to handle interlockings +
-    * SW based +
-    * HW based +
-  * Profiling +
-    * Static profiling +
-    * Helps from the software (compiler) +
-      * Superblock optimization +
-      * Analyzing basic blocks +
-  * How to deal with branches? +
-    * Branch prediction +
-    * Delayed branching (branch delay slot) +
-    * Forward control flow/​backward control flow +
-    * Branch prediction accuracy +
-  * Profile guided code positioning +
-    * Based on the profile info. position the code based on it +
-    * Try to make the next sequential instruction be the next inst. to be executed +
-  * Trace cache +
-  * Predicate combining (combine predicate for a branch instruction) +
-  * Predicated execution (control dependence becomes data dependence) +
-  * Definition of basic blocks +
-  * Control flow graph +
- +
-===== Lecture 9 (2/3 Mon.) ===== +
-  * Delayed branching +
-    * benefit? +
-    * What does it eliminates?​ +
-    * downside? +
-    * Delayed branching in SPARC (with squashing) +
-    * Backward compatibility with the delayed slot +
-    * What should be filled in the delayed slot +
-    * How to ensure correctness +
-  * Fine-grained multithreading +
-    * fetch from different threads  +
-    * What are the issues (what if the program doesn'​t have many threads) +
-    * CDC 6000 +
-    * Denelcor HEP +
-    * No dependency checking +
-    * Inst. from different thread can fill-in the bubbles +
-    * Cost? +
-  * Simulteneuos multithreading +
-  * Branch prediction +
-    * Guess what to fetch next. +
-    * Misprediction penalty +
-    * Need to guess the direction and target +
-    * How to perform the performance analysis? +
-      * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction. +
-      * Given the program/​number of instructions,​ percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions. +
-        * How many extra instructions are being fetched? +
-        * What is the performance degredation?​ +
-    * How to reduce the miss penalty? +
-    * Predicting the next address (non PC+4 address) +
-    * Branch target buffer (BTB) +
-      * Predicting the address of the branch +
-    * Global branch history - for directions +
-    * Can use compiler to profile and get more info +
-      * Input set dictacts the accuracy +
-      * Add time to compilation +
-    * Heuristics that are common and doesn'​t require profiling. +
-      * Might be inaccurate +
-      * Does not require profiling +
-    * Programmer can tell the hardware (via pragmas (hints)) +
-      * For example, x86 has the hint bit +
-    * Dynamic branch prediction +
-      * Last time predictor +
-      * Two bits counter based prediction +
-        * One more bit for hysteresis +
- +
- +
-===== Lecture 10 (2/5 Wed.) ===== +
- +
-  * Branch prediction accuracy +
-    * Why are they very important?​ +
-      * Differences between 99% accuracy and 98% accuracy +
-      * Cost of a misprediction when the pipeline is veryd eep +
-  * Value prediction +
-  * Global branch correlation +
-    * Some branches are correlated +
-  * Local branch correlation +
-    * Some branches can depend on the result of past branches +
-  * Pattern history table +
-    * Record global taken/not taken results.  +
-    * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?) +
-  * One-level branch predictor +
-    * What information are used +
-  * Two-level branch prediction +
-    * What entries do you keep in the glocal history? +
-    * What entries do you keep in the local history? +
-    * How many table? +
-    * Cost when training a table +
-    * What are the purposes of each table? +
-    * Potential problems of a two-level history +
-  * GShare predictor +
-    * Global history predictor is hashed with the PC +
-    * Store both GHP and PC in one combined information +
-    * How do you use the information?​ Why does the XOR result still usable? +
-  * Slides (page 16-18) for a good overview of one- and two-level predictors +
-  * Warmup cost of the branch predictor +
-    * Hybrid solution? Fast warmup is used first, then switch to the slower one. +
-  * Tournament predictor (Alpha 21264) +
-  * Other types of branch predictor +
-    * Using machine learning? +
-    * Geometric history length +
-      * Look at branches far behind (but using geometric step) +
-  * Predicated execution - eliminate branches +
-    * What are the tradeoffs +
-    * What is the block is big (can lead to execution a lot of useless work) +
-    * Allows easier code optimization +
-      * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block +
-      * Reduce control dependences +
-    * Need ISA support +
-  * Wish branches +
-    * Compiler generate both predicated and non-predicated codes +
-    * HW design which one to use +
-      * Use branch prediction on an easy to predict code +
-      * Use predicated execution on a hard to predict code +
-      * Compiler can be more aggressive in optimimzing the code +
-    * What are the tradeoffs (slide# 47) +
-  * Multi-path execution +
-    * Execute both paths +
-    * Can lead to wasted work +
- +
- +
-===== Lecture 11 (2/12 Wed.) ===== +
- +
-  * Call and return prediction +
-    * Direct call is easy to predict +
-    * Retun is harder (indirect branches) +
-      * Nested calls make return easier to predict +
-        * Can use stack to predict the return +
-    * Indirect branch prediction +
-      * These branches have multiple targets +
-      * For switch-case,​ virtual function calls, jump tables, interface calls +
-      * BTB to predict the target address - low accuracy +
-      * History based: BTB + GHR +
-      * Virtual program counter prediction +
-  * Complications in superscalar processors +
-    * Fetch? What if multiple branches are fetched at the same time+
-    * Logic requires to ensure correctness?​ +
-  * Multi-cycle executions (Different functional units take different number of cycles) +
-    * Instructions can retire out-of-order +
-      * How to deal with this case? Stall? Throw exceptions if there are problems? +
-  * Exceptions and Interrupts +
-    * When they are handled?  +
-      * Why are some interrupts should be handled right away? +
-  * Precise exception +
-    * arch. state should be consistent before handling the exception/​interrupts +
-      * Easier to debug (you see the sequential flow when the interrupt occurs) +
-        * Deterministic +
-      * Easier to recover from the exception +
-      * Easier to restart the processes +
-    * How to ensure precise exception?​ +
-      * Tradeoffs between each method +
-  * Reorder buffer +
-    * Reorder results before they are visible to the arch. state +
-      * Need to presearve the sequential sematic and data +
-    * What are the informatinos in the ROB entry +
-    * Where to get the value from (forwarding path? reorder buffer?) +
-      * Extra logic to check where the youngest instructions/​value is +
-      * Content addressible search +
-        * A lot of comparators +
-    * Different ways to simplify the reorder buffer +
-    * Register renaming +
-      * Same register refers to independent values (lacks of registers) +
-    * Where does the exception happen (after retire) +
-  * History buffer +
-    * Update the register file when the instruction complete. Unroll if there is an exception. +
-  * Future file (commonly used, along with reorder buffer) +
-    * Keep two set of register files +
-      * An updated value (Speculative),​ called fiture file +
-      * A backup value (to restore the state quickly +
-    * Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone) +
-  * Branch misprediction resembles Exception +
-    * The difference is that branch misprediction is not visible to the software +
-      * Also much more common (say, divide by zero vs. a mispredicted branch) +
-    * Recovery is similar to exception handling +
-  * Latency of the state recovery +
-  * What to do during the state recovery +
-  * Checkpointing +
-    * Advantages?​ +
-     +
-===== Lecture 14 (2/19 Wed.) ===== +
- +
-  * Predictor (branch predictor, cache line predictor ...) +
-  * Power budget (and its importance) +
-  * Architectural state, precise state +
-  * Memory dependence is known dynamically +
-  * Register state is not shared across threads/​processors +
-  * Memory state is shared across threads/​processors +
-  * How to maintain speculative memory states +
-  * Write buffers (helps simplify the process of checking the reorder buffer) +
-  * Overall OoO mechanism +
-    * What are other ways of eliminating dispatch stalls +
-    * Dispatch when the sources are ready +
-    * Retired instructions make the source available +
-    * Register renaming +
-    * Reservation station +
-      * What goes into the reservation station +
-      * Tags required in the reservation station +
-    * Tomasulo'​s algorithm +
-    * Without precise exception, OoO is hard to debug +
-    * Arch. register ID +
-    * Examples in the slides +
-      * Slides 28 --> register renaming +
-      * Slides 30-35 --> Exercise (also on the board) +
-        * This will be usefull for the midterm +
-    * Register aliasing table +
- +
-===== Lecture 15 (2/21 Fri.) ===== +
- +
-  * OoO --> Restricted Dataflow +
-    * Extracting parallelism +
-    * What are the bottlenecks?​ +
-      * Issue width +
-      * Dispatch width +
-      * Parallelism in the program +
-    * More example on slide #10 +
-    * What does it mean to be restricted data flow +
-      * Still visible as a Von Neumann model +
-    * Where does the efficiency come from? +
-    * Size of the scheduling windors/​reorder buffer. Tradeoffs? What make sense? +
-  * Load/store handling +
-    * Would like to schedule them out of order, but make them visible in-order +
-    * When do you schedule the load/store instructions?​ +
-    * Can we predict if load/store are dependent?​ +
-    * This is one of the most complex structure of the load/store handling +
-    * What information can be used to predict these load/store optimization?​ +
-  * Note: IPC = 1/CPI +
-  * Centralized vs. distributed?​ What are the tradeoffs?​ +
-  * How to handle when there is a misprediction/​recovery +
-  * Token dataflow arch. +
-    * What are tokens? +
-    * How to match tokens +
-    * Tagged token dataflow arch. +
-    * What are the tradeoffs?​ +
-    * Difficulties?​ +
- +
-===== Lecture 16 (2/24 Mon.) ===== +
- +
-  * SISD/​SIMD/​MISD/​MIMD +
-  * Array processor +
-  * Vector processor +
-  * Data parallelism +
-    * Where does the concurrency arise? +
-  * Differences between array processor vs. vector processor +
-  * VLIW +
-  * Compactness of an array processor  +
-  * Vector operates on a vector of data (rather than a single datum (scalar)) +
-    * Vector length (also applies to array processor) +
-    * No dependency within a vector --> can have a deep pipeline +
-    * Highly parallel (both instruction level (ILP) and memory level (MLP)) +
-    * But the program needs to be very parallel +
-    * Memory can be the bottleneck (due to very high MLP) +
-    * What does the functional units look like? Deep pipelin and simpler control. +
-    * CRAY-I is one of the examples of vector processor +
-    * Memory access pattern in a vector processor +
-      * How do the memory accesses benefit the memory bandwidth?​ +
-      * Please refer to slides 73-74 in http://​www.ece.cmu.edu/​~ece447/​s13/​lib/​exe/​fetch.php?​media=onur-447-spring13-lecture25-mainmemory-afterlecture.pdf for a breif explanation of memory level parallelism +
-      * Stride length vs. the number of banks +
-        * stride length should be relatively prime to the number of banks +
-      * Tradeoffs between ​row major and column major --> How can the vector processor deals with the two +
-    * How to calculate the efficiency and performance of vector processors +
-    * What if there are multiple memory ports? +
-    * Gather/​Scatter allows vector processor to be a lot more programmable (i.e. gather data for parallelism) +
-      * Helps handling sparse metrices +
-    * Conditional operation +
-    * Structure of vector units +
-    * How to automatically parallelize code through the compiler? +
-      * This is a hard problem. Compiler does not know the memory address. +
-  * What do we need to ensure for both vector and array processor?​ +
-  * Sequential bottleneck  +
-    * Amdahl'​s law   +
-  * Intel MMX --> An example of Intel'​s approach to SIMD +
-    * No VLEN, use OpCode to define the length +
-    * Stride is one in MMX +
-    * Intel SSE --> Modern version of MMX+
  
buzzword.txt · Last modified: 2015/04/27 18:20 by rachata