Differences

This shows you the differences between two versions of the page.

--- buzzword [2014/02/24 19:17]
rachata
+++ buzzword [2015/01/13 01:37]
kevincha [Lecture 1 (1/12 Mon.)]
@@ Line 1: / Line 1: @@
 ====== Buzzwords ======
-Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly.  This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.
+Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.
-===== Lecture 1 (1/13 Mon.) =====
+===== Lecture 1 (1/12 Mon.) =====
   * Level of transformation
     * Algorithm
@@ Line 10: / Line 9: @@
     * Compiler
   * Cross abstraction layers
-    * Expose an interface
+  * Exposing an interface
   * Tradeoffs
   * Caches
@@ Line 16: / Line 15: @@
   * Multi-core
   * Unfairness
-  * DRAM controller/Memory controller
+  * DRAM/memory controller
   * Memory hog
   * Row buffer hit/miss
   * Row buffer locality
-  * Streaming access/ Random access
+  * Streaming access vs. random access
-  * DRAM refresh
-  * Retention time
-  * Profiling DRAM retention time
   * Power consumption
-  * Wimpy cores
   * Bloom filter
-    * Pros/Cons
-    * False Positive
-  * Simulation
   * Memory performance attacks
-  * RTL design
+  * Hamming code
+  * Hamming distance
-===== Lecture 2 (1/15 Wed.) =====
+  * Abstraction layer
+  * Memory performance hog
-  * Optimizing for energy/ Optimizing for the performance
+  * Shared DRAM memory system
-    * Generally you should optimize for the users
+  * Unfairness
-  * state-of-the-art
+  * DRAM banks
-  * RTL Simulation
+  * Memory scheduling policies
-    * Long, slow and can be costly
+  * Scheduling priority
-  * High level simulation
+  * Retention time of DRAM
-    * What should be employed?
+  * Process variation
-	* Important to get the idea of how they are implemented in RTL
+  * Retention time profile
-	* Allows designer to filter out techniques that do not work well
+  * DRAM row hammer
-  * Design points
-    * Design processors to meet the design points
-  * Software stack
-  * Design decisions
-  * Datacenters
-  * MIPS R2000
-    * What are architectural techniques that improve the performance of a processor over MIPS 2000
-  * Moore's Law
-  * in-order execution
-  * out-of-order execution
-  * technologies that are available on cellphones
-  * new applications that are made available through new computer architecture techniques
-    * more data mining (genomics/medical areas)
-	* lower power (cellphones)
-	* smaller cores (cellphones/computers)
-	* etc.
-  * Performance bottlenecks in a single thread/core processors
-    * multi-core as an alternative
-  * Memory wall (a part of scaling issue)
-  * Scaling issue
-    * Transister are getting smaller
-  * Reliability problems that cause errors
-  * Analogies from Kuhn's "The Structure of Scientific Revolutions" (Recommended book)
-    * Pre paradigm science
-    * Normal science
-    * Revolutionalry science
-  * Components of a computer
-    * Computation
-      * Communication
-        * Storage
-          * DRAM
-          * NVRAM (Non-volatile memory): PCM, STT-MRAM
-          * Storage (Flash/Harddrive)
-  * Von Neumann Model (Control flow model)
-    * Stored program computer
-        * Properties of Von Neumann Model: Stored program, sequential instruction processing
-        * Unified memory
-          * When does an instruction is being interpreted as an instruction (as oppose to a datum)?
-        * Program counter
-        * Examples: x86, ARM, Alpha, IBM Power series, SPARC, MIPS
-  * Data flow model
-    * Data flow machine
-      * Data flow graph
-    * Operands
-    * Live-outs/Live-ins
-      * DIfferent types of data flow nodes (conditional/relational/barrier)
-    * How to do transactional transaction in dataflow?
-      * Example: bank transactions
-  * Tradeoffs between control-driven and data-driven
-    * What are easier to program?
-    * Which are easy to compile?
-    * What are more parallel (does that mean it is faster?)
-    * Which machines are more complex to design?
-    * In control flow, when a program is stop, there is a pointer to the current state (precise state).
-  * ISA vs. Microarchitecture
-    * Semantics in the ISA
-    * uArch should obey the ISA
-    * Changing ISA is costly, can affect compatibility.
-  * Instruction pointers
-  * uArch techniques: common and powerful techniques break Vonn Neumann model if done at the ISA level
-    * Conceptual techniques
-      * Pipelining
-      * Multiple instructions at a time
-      * Out-of-order executions
-      * etc.
-    * Design techniques
-      * Adder implementation (Bit serial, ripple carry, carry lookahead)
-      * Connection machine (an example of a machine that use bit serial to tradeoff latency for more parallelism)
-  * Microprocessor: ISA + uArch + circuits
-  * What are a part of the ISA? Instructions, memory, etc.
-    * Things that are visible to the programmer/software
-  * What are not a part of the ISA? (what goes inside: uArch techniques)
-    * Things that are not suppose to be visible to the programmer/software but typically make the processor faster and/or consumes less power and/or less complex
-===== Lecture 3 (1/17 Fri.) =====
-  * Design tradeoffs
-  * Macro Architectures
-  * Reconfiguribility vs. specialized designs
-  * Parallelism (instructions, data parallel)
-  * Uniform decode (Example: Alpha)
-  * Steering bits (Sub-opcode)
-  * 0,1,2,3 address machines
-    * Stack machine
-    * Accumulator machine
-    * 2-operand machine
-    * 3-operand machine
-    * Tradeoffs between 0,1,2,3 address machines
-  * Instructions/Opcode/Operade specifiers (i.e. addressing modes)
-  * Simply vs. complex data type (and their tradeoffs)
-  * Semantic gap
-  * Translation layer
-  * Addressability
-  * Byte/bit addressable machines
-  * Virtual memory
-  * Big/little endian
-  * Benefits of having registers (data locality)
-  * Programmer visible (Architectural) state
-    * Programmers can access this directly
-    * What are the benefits?
-  * Microarchitectural state
-    * Programmers cannot access this directly
-  * Evolution of registers (from accumulators to registers)
-  * Different types of instructions
-    * Control instructions
-    * Data instructions
-    * Operation instructions
-  * Addressing modes
-    * Tradeoffs (complexity, flexibility, etc.)
-  * Orthogonal ISA
-    * Addressing modes that are orthogonal to instructino types
-  * Vectors vs. non vectored interrupts
-  * Complex vs. simple instructions
-    * Tradeoffs
-  * RISC vs. CISC
-    * Tradeoff
-    * Backward compatibility
-    * Performance
-    * Optimization opportunity
-===== Lecture 4 (1/22 Wed.) =====
-  * Semantic gap
-    * Small vs. Large semantic gap (CISC vs. RISC)
-    * Benefit of RISC vs. CISC
-  * Micro operations/microcode
-    * Translate complex instructions into smaller instructions
-  * Parallelism (motivation for RISC)
-  * Compiler optimization
-  * Code optimization through translation
-  * VLIW
-  * Fixed vs. variable length instructions
-    * Tradeoffs
-      * Alignment issues? (fetch/decode)
-      * Decoding issues?
-      * Code size?
-      * Adding additional instructions?
-      * Memory bandwidth and cache utilization?
-      * Energy?
-    * Encoding in variable length instructions
-  * Structure of Alpha instructions and other uniform decode instructions
-    * Different type of instructions
-    * Benefit of knowing what type of instructions
-      * Speculatively operate future instructions
-  * x86 and other non-uniform decode instructions
-    * Tradeoff vs. uniform decode
-  * Tradeoffs for different number of registers
-    * Spilling into memory if the number of registers is small
-    * Compiler optimization on how to manage which value to keep/spill
-  * Addressing modes
-    * Benefits?
-    * Types?
-    * Different uses of addressing modes?
-  * Various ISA-level tradeoffs
-  * Virtual memory
-  * Unalign memory access/aligned memory access
-    * Cost vs. benefit of unaligned access
-  * ISA specification
-    * Things you have to obey/specifie in the ISA specification
-  * Architectural states
-  * Microarchitecture implements how arch. state A transformed to the next arch. state A'
-  * Single cycle machines
-    * Critical path in the single cycle machine
-  * Multi cycle machines
-  * Functional units
-  * Performance metrics
-    * CPI/IPC
-      * CPI of a single cycle microarchitecture
-===== Lecture 5 (1/24 Fri.) =====
-  * Instruction processing
-    * Fetch
-    * Decode
-    * Execute
-    * Memory fetch
-    * Writeback
-  * Datapath & Control logic in microprocessors
-  * Different types of instructions (I-type, R-type, etc.)
-  * Control flow instructions
-  * Non-control flow instructions
-  * Delayed slot/Delayed branch
-  * Single cycle control logic
-  * Lockstep
-  * Critical path analysis
-    * Critical path of a single cycle processor
-  * Combinational logic & Sequential logic
-  * Control store
-  * Tradeoffs of a single cycle uarch
-  * Dynamic power/Static power
-  * Speedup calculation
-    * Parallelism
-    * Serial bottleneck
-    * Amdahl's bottleneck
-  * Design principles
-    * Common case design
-    * Critical path design
-    * Balanced designs
-  * Multi cycle design
-===== Lecture 6 (1/27 Mon.) =====
-  * Microcoded/Microprogrammed machines
-    * States
-    * Microinstructions
-    * Microsequencing
-    * Control store - Product control signals
-    * Microsequencer
-    * Control signal
-      * What do they have to control?
-  * Instruction processing cycle
-  * Latch signals
-  * State machine
-  * State variables
-  * Condition code
-  * Steering bits
-  * Branch enable logic
-  * Difference between gating and loading? (write enable vs. driving the bus)
-  * Memory mapped I/O
-  * Hardwired logic
-    * What control signals come from hardwired logic?
-  * Variable latency memory
-  * Handling interrupts
-  * Difference betwen interrupts and exceptions
-  * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
-  * Updating machine behavior
-  * Horizontal microcode
-  * Vertical microcode
-  * Primitives
-===== Lecture 7 (1/29 Wed.) =====
-  * Pipelining
-  * Limitations of the multi-programmed design
-    * Idle resources
-  * Throughput of a pipelined design
-    * What dictacts the throughput of a pipelined design?
-  * Latency of the pipelined design
-  * Dependency
-  * Overhead of pipelining
-    * Latch cost?
-  * Data forwarding/bypassing
-  * What are the ideal pipeline?
-  * External fragmentation
-  * Issues in pipeline designs
-    * Stalling
-      * Dependency (Hazard)
-        * Flow dependence
-        * Output dependence
-        * Anti dependence
-        * How to handle them?
-    * Resource contention
-    * Keeping the pipeline full
-    * Handling exception/interrupts
-    * Pipeline flush
-    * Speculation
-  * Interlocking
-  * Multipath execution
-  * Fine grain multithreading
-  * No-op (Bubbles in the pipeline)
-  * Valid bits in the instructions
-===== Lecture 8 (1/31 Fri.) =====
-  * Branch prediction
-  * Different types of data dependence
-  * Pipeline stalls
-    * bubbles
-    * How to handle stalls
-    * Stall conditions
-    * Stall signals
-    * Dependences
-      * Distant between dependences
-    * Data forwarding/bypassing
-    * Maintaining the correct dataflow
-  * Different ways to design data forwarding path/logic
-  * Different techniques to handle interlockings
-    * SW based
-    * HW based
-  * Profiling
-    * Static profiling
-    * Helps from the software (compiler)
-      * Superblock optimization
-      * Analyzing basic blocks
-  * How to deal with branches?
-    * Branch prediction
-    * Delayed branching (branch delay slot)
-    * Forward control flow/backward control flow
-    * Branch prediction accuracy
-  * Profile guided code positioning
-    * Based on the profile info. position the code based on it
-    * Try to make the next sequential instruction be the next inst. to be executed
-  * Trace cache
-  * Predicate combining (combine predicate for a branch instruction)
-  * Predicated execution (control dependence becomes data dependence)
-  * Definition of basic blocks
-  * Control flow graph
-===== Lecture 9 (2/3 Mon.) =====
-  * Delayed branching
-    * benefit?
-    * What does it eliminates?
-    * downside?
-    * Delayed branching in SPARC (with squashing)
-    * Backward compatibility with the delayed slot
-    * What should be filled in the delayed slot
-    * How to ensure correctness
-  * Fine-grained multithreading
-    * fetch from different threads
-    * What are the issues (what if the program doesn't have many threads)
-    * CDC 6000
-    * Denelcor HEP
-    * No dependency checking
-    * Inst. from different thread can fill-in the bubbles
-    * Cost?
-  * Simulteneuos multithreading
-  * Branch prediction
-    * Guess what to fetch next.
-    * Misprediction penalty
-    * Need to guess the direction and target
-    * How to perform the performance analysis?
-      * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction.
-      * Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
-        * How many extra instructions are being fetched?
-        * What is the performance degredation?
-    * How to reduce the miss penalty?
-    * Predicting the next address (non PC+4 address)
-    * Branch target buffer (BTB)
-      * Predicting the address of the branch
-    * Global branch history - for directions
-    * Can use compiler to profile and get more info
-      * Input set dictacts the accuracy
-      * Add time to compilation
-    * Heuristics that are common and doesn't require profiling.
-      * Might be inaccurate
-      * Does not require profiling
-    * Programmer can tell the hardware (via pragmas (hints))
-      * For example, x86 has the hint bit
-    * Dynamic branch prediction
-      * Last time predictor
-      * Two bits counter based prediction
-        * One more bit for hysteresis
-===== Lecture 10 (2/5 Wed.) =====
-  * Branch prediction accuracy
-    * Why are they very important?
-      * Differences between 99% accuracy and 98% accuracy
-      * Cost of a misprediction when the pipeline is veryd eep
-  * Value prediction
-  * Global branch correlation
-    * Some branches are correlated
-  * Local branch correlation
-    * Some branches can depend on the result of past branches
-  * Pattern history table
-    * Record global taken/not taken results.
-    * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?)
-  * One-level branch predictor
-    * What information are used
-  * Two-level branch prediction
-    * What entries do you keep in the glocal history?
-    * What entries do you keep in the local history?
-    * How many table?
-    * Cost when training a table
-    * What are the purposes of each table?
-    * Potential problems of a two-level history
-  * GShare predictor
-    * Global history predictor is hashed with the PC
-    * Store both GHP and PC in one combined information
-    * How do you use the information? Why does the XOR result still usable?
-  * Slides (page 16-18) for a good overview of one- and two-level predictors
-  * Warmup cost of the branch predictor
-    * Hybrid solution? Fast warmup is used first, then switch to the slower one.
-  * Tournament predictor (Alpha 21264)
-  * Other types of branch predictor
-    * Using machine learning?
-    * Geometric history length
-      * Look at branches far behind (but using geometric step)
-  * Predicated execution - eliminate branches
-    * What are the tradeoffs
-    * What is the block is big (can lead to execution a lot of useless work)
-    * Allows easier code optimization
-      * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block
-      * Reduce control dependences
-    * Need ISA support
-  * Wish branches
-    * Compiler generate both predicated and non-predicated codes
-    * HW design which one to use
-      * Use branch prediction on an easy to predict code
-      * Use predicated execution on a hard to predict code
-      * Compiler can be more aggressive in optimimzing the code
-    * What are the tradeoffs (slide# 47)
-  * Multi-path execution
-    * Execute both paths
-    * Can lead to wasted work
-===== Lecture 11 (2/12 Wed.) =====
-  * Call and return prediction
-    * Direct call is easy to predict
-    * Retun is harder (indirect branches)
-      * Nested calls make return easier to predict
-        * Can use stack to predict the return
-    * Indirect branch prediction
-      * These branches have multiple targets
-      * For switch-case, virtual function calls, jump tables, interface calls
-      * BTB to predict the target address - low accuracy
-      * History based: BTB + GHR
-      * Virtual program counter prediction
-  * Complications in superscalar processors
-    * Fetch? What if multiple branches are fetched at the same time?
-    * Logic requires to ensure correctness?
-  * Multi-cycle executions (Different functional units take different number of cycles)
-    * Instructions can retire out-of-order
-      * How to deal with this case? Stall? Throw exceptions if there are problems?
-  * Exceptions and Interrupts
-    * When they are handled?
-      * Why are some interrupts should be handled right away?
-  * Precise exception
-    * arch. state should be consistent before handling the exception/interrupts
-      * Easier to debug (you see the sequential flow when the interrupt occurs)
-        * Deterministic
-      * Easier to recover from the exception
-      * Easier to restart the processes
-    * How to ensure precise exception?
-      * Tradeoffs between each method
-  * Reorder buffer
-    * Reorder results before they are visible to the arch. state
-      * Need to presearve the sequential sematic and data
-    * What are the informatinos in the ROB entry
-    * Where to get the value from (forwarding path? reorder buffer?)
-      * Extra logic to check where the youngest instructions/value is
-      * Content addressible search
-        * A lot of comparators
-    * Different ways to simplify the reorder buffer
-    * Register renaming
-      * Same register refers to independent values (lacks of registers)
-    * Where does the exception happen (after retire)
-  * History buffer
-    * Update the register file when the instruction complete. Unroll if there is an exception.
-  * Future file (commonly used, along with reorder buffer)
-    * Keep two set of register files
-      * An updated value (Speculative), called fiture file
-      * A backup value (to restore the state quickly
-    * Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone)
-  * Branch misprediction resembles Exception
-    * The difference is that branch misprediction is not visible to the software
-      * Also much more common (say, divide by zero vs. a mispredicted branch)
-    * Recovery is similar to exception handling
-  * Latency of the state recovery
-  * What to do during the state recovery
-  * Checkpointing
-    * Advantages?
-===== Lecture 14 (2/19 Wed.) =====
-  * Predictor (branch predictor, cache line predictor ...)
-  * Power budget (and its importance)
-  * Architectural state, precise state
-  * Memory dependence is known dynamically
-  * Register state is not shared across threads/processors
-  * Memory state is shared across threads/processors
-  * How to maintain speculative memory states
-  * Write buffers (helps simplify the process of checking the reorder buffer)
-  * Overall OoO mechanism
-    * What are other ways of eliminating dispatch stalls
-    * Dispatch when the sources are ready
-    * Retired instructions make the source available
-    * Register renaming
-    * Reservation station
-      * What goes into the reservation station
-      * Tags required in the reservation station
-    * Tomasulo's algorithm
-    * Without precise exception, OoO is hard to debug
-    * Arch. register ID
-    * Examples in the slides
-      * Slides 28 --> register renaming
-      * Slides 30-35 --> Exercise (also on the board)
-        * This will be usefull for the midterm
-    * Register aliasing table
-===== Lecture 15 (2/21 Fri.) =====
-  * OoO --> Restricted Dataflow
-    * Extracting parallelism
-    * What are the bottlenecks?
-      * Issue width
-      * Dispatch width
-      * Parallelism in the program
-    * More example on slide #10
-    * What does it mean to be restricted data flow
-      * Still visible as a Von Neumann model
-    * Where does the efficiency come from?
-    * Size of the scheduling windors/reorder buffer. Tradeoffs? What make sense?
-  * Load/store handling
-    * Would like to schedule them out of order, but make them visible in-order
-    * When do you schedule the load/store instructions?
-    * Can we predict if load/store are dependent?
-    * This is one of the most complex structure of the load/store handling
-    * What information can be used to predict these load/store optimization?
-  * Note: IPC = 1/CPI
-  * Centralized vs. distributed? What are the tradeoffs?
-  * How to handle when there is a misprediction/recovery
-  * Token dataflow arch.
-    * What are tokens?
-    * How to match tokens
-    * Tagged token dataflow arch.
-    * What are the tradeoffs?
-    * Difficulties?
-===== Lecture 16 (2/24 Mon.) =====
-  * SISD/SIMD/MISD/MIMD
-  * Array processor
-  * Vector processor
-  * Data parallelism
-    * Where does the concurrency arise?
-  * Differences between array processor vs. vector processor
-  * VLIW
-  * Compactness of an array processor
-  * Vector operates on a vector of data (rather than a single datum (scalar))
-    * Vector length (also applies to array processor)
-    * No dependency within a vector --> can have a deep pipeline
-    * Highly parallel (both instruction level (ILP) and memory level (MLP))
-    * But the program needs to be very parallel
-    * Memory can be the bottleneck (due to very high MLP)
-    * What does the functional units look like? Deep pipelin and simpler control.
-    * CRAY-I is one of the examples of vector processor
-    * Memory access pattern in a vector processor
-      * How do the memory accesses benefit the memory bandwidth?
-      * Please refer to slides 73-74 in http://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture25-mainmemory-afterlecture.pdf for a breif explanation of memory level parallelism
-      * Stride length vs. the number of banks
-        * stride length should be relatively prime to the number of banks
-      * Tradeoffs between row major and column major --> How can the vector processor deals with the two
-    * How to calculate the efficiency and performance of vector processors
-    * What if there are multiple memory ports?
-    * Gather/Scatter allows vector processor to be a lot more programmable (i.e. gather data for parallelism)
-      * Helps handling sparse metrices
-    * Conditional operation
-    * Structure of vector units
-    * How to automatically parallelize code through the compiler?
-      * This is a hard problem. Compiler does not know the memory address.
-  * What do we need to ensure for both vector and array processor?
-  * Sequential bottleneck
-    * Amdahl's law
-  * Intel MMX --> An example of Intel's approach to SIMD
-    * No VLEN, use OpCode to define the length
-    * Stride is one in MMX
-    * Intel SSE --> Modern version of MMX

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools