User Tools

Site Tools


buzzword

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
buzzword [2015/01/26 23:13]
kevincha [Lecture 4 (1/21 Wed.)]
buzzword [2015/02/12 19:18]
kevincha [Lecture 11 (2/11 Mon.)]
Line 206: Line 206:
     * Dynamic power/​Static power     * Dynamic power/​Static power
       * Increases in power due to frequency       * Increases in power due to frequency
 +
 +===== Lecture 6 (1/28 Mon.) =====
 +
 +  * Design principles
 +    * Common case design
 +    * Critical path design
 +    * Balanced designs
 +  * Multi cycle design
 +  * Microcoded/​Microprogrammed machines
 +    * States
 +    * Translation from one state to another
 +    * Microinstructions
 +    * Microsequencing
 +    * Control store - Product control signals
 +    * Microsequencer ​  
 +    * Control signal
 +      * What do they have to control? ​   ​
 +  * Instruction processing cycle
 +  * Latch signals
 +  * State machine
 +  * State variables
 +  * Condition code
 +  * Steering bits
 +  * Branch enable logic
 +  * Difference between gating and loading? (write enable vs. driving the bus)
 +  * Memory mapped I/O
 +  * Hardwired logic
 +    * What control signals come from hardwired logic?
 +  * Variable latency memory
 +  * Handling interrupts
 +  * Difference betwen interrupts and exceptions
 +  * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
 +  * Updating machine behavior
 +  * Horizontal microcode
 +  * Vertical microcode
 +  * Primitives
 +  ​
 +===== Lecture 7 (1/30 Fri.) =====
 +
 +  * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
 +  * Updating machine behavior
 +  * Horizontal microcode
 +  * Vertical microcode
 +  * Primitives
 +  * nanocode and millicode
 +    * what are the differences between nano/​milli/​microcode
 +  * microprogrammed vs. hardwire control
 +  * Pipelining
 +  * Limitations of the multi-programmed design
 +    * Idle resources
 +  * Throughput of a pipelined design
 +    * What dictacts the throughput of a pipelined design?
 +  * Latency of the pipelined design
 +  * Dependency
 +  * Overhead of pipelining
 +    * Latch cost?
 +  * Data forwarding/​bypassing
 +  * What are the ideal pipeline?
 +  * External fragmentation
 +  * Issues in pipeline designs
 +    * Stalling
 +      * Dependency (Hazard)
 +        * Flow dependence
 +        * Output dependence
 +        * Anti dependence
 +        * How to handle them?
 +    * Resource contention
 +    * Keeping the pipeline full
 +    * Handling exception/​interrupts
 +    * Pipeline flush
 +    * Speculation
 +  ​
 +===== Lecture 8 (2/2 Mon.) =====  ​
 +
 +  * Interlocking
 +  * Multipath execution
 +  * Fine grain multithreading
 +  * No-op (Bubbles in the pipeline)
 +  * Valid bits in the instructions
 +  * Branch prediction
 +  * Different types of data dependence
 +  * Pipeline stalls
 +    * bubbles
 +    * How to handle stalls
 +    * Stall conditions
 +    * Stall signals
 +    * Dependences
 +      * Distant between dependences
 +    * Data forwarding/​bypassing
 +    * Maintaining the correct dataflow
 +  * Different ways to design data forwarding path/logic
 +  * Different techniques to handle interlockings
 +    * SW based
 +    * HW based
 +  * Profiling
 +    * Static profiling
 +    * Helps from the software (compiler)
 +      * Superblock optimization
 +      * Analyzing basic blocks
 +  * How to deal with branches?
 +    * Branch prediction
 +    * Delayed branching (branch delay slot)
 +    * Forward control flow/​backward control flow
 +    * Branch prediction accuracy
 +  * Profile guided code positioning
 +    * Based on the profile info. position the code based on it
 +    * Try to make the next sequential instruction be the next inst. to be executed
 +  * Predicate combining (combine predicate for a branch instruction)
 +  * Predicated execution (control dependence becomes data dependence)
 +  ​
     ​     ​
 +===== Lecture 9 (2/4 Wed.) =====  ​
 +  * Definition of basic blocks
 +  * Control flow graph
 +  * Delayed branching
 +    * benefit?
 +    * What does it eliminates?
 +    * downside?
 +    * Delayed branching in SPARC (with squashing)
 +    * Backward compatibility with the delayed slot
 +    * What should be filled in the delayed slot
 +    * How to ensure correctness
 +  * Fine-grained multithreading
 +    * fetch from different threads ​
 +    * What are the issues (what if the program doesn'​t have many threads)
 +    * CDC 6000
 +    * Denelcor HEP
 +    * No dependency checking
 +    * Inst. from different thread can fill-in the bubbles
 +    * Cost?
 +  * Simulteneuos multithreading
 +  * Branch prediction
 +    * Guess what to fetch next.
 +    * Misprediction penalty
 +    * Need to guess the direction and target
 +    * How to perform the performance analysis?
 +      * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction.
 +      * Given the program/​number of instructions,​ percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
 +        * How many extra instructions are being fetched?
 +        * What is the performance degredation?​
 +    * How to reduce the miss penalty?
 +    * Predicting the next address (non PC+4 address)
 +    * Branch target buffer (BTB)
 +      * Predicting the address of the branch
 +    * Global branch history - for directions
 +    * Can use compiler to profile and get more info
 +      * Input set dictacts the accuracy
 +      * Add time to compilation
 +    * Heuristics that are common and doesn'​t require profiling.
 +      * Might be inaccurate
 +      * Does not require profiling
 +    * Static branch prediction
 +      * Pregrammer provides pragmas, hinting the likelihood of taken/not taken branch
 +      * For example, x86 has the hint bit
 +    * Dynamic branch prediction
 +      * Last time predictor
 +      * Two bits counter based prediction
 +        * One more bit for hysteresis
 +
 +===== Lecture 10 (2/6 Fri.) =====
 +  * Branch prediction accuracy
 +    * Why are they very important?
 +      * Differences between 99% accuracy and 98% accuracy
 +      * Cost of a misprediction when the pipeline is veryd eep
 +  * Global branch correlation
 +    * Some branches are correlated
 +  * Local branch correlation
 +    * Some branches can depend on the result of past branches
 +  * Pattern history table
 +    * Record global taken/not taken results. ​
 +    * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?)
 +  * One-level branch predictor
 +    * What information are used
 +  * Two-level branch prediction
 +    * What entries do you keep in the global history?
 +    * What entries do you keep in the local history?
 +    * How many table?
 +    * Cost when training a table
 +    * What are the purposes of each table?
 +    * Potential problems of a two-level history
 +  * GShare predictor
 +    * Global history predictor is hashed with the PC
 +    * Store both GHP and PC in one combined information
 +    * How do you use the information?​ Why does the XOR result still usable?
 +  * Warmup cost of the branch predictor
 +    * Hybrid solution? Fast warmup is used first, then switch to the slower one.
 +  * Tournament predictor (Alpha 21264)
 +  * Predicated execution - eliminate branches
 +    * What are the tradeoffs
 +    * What if the block is big (can lead to execution a lot of useless work)
 +    * Allows easier code optimization
 +      * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block
 +      * Reduce control dependences
 +    * Need ISA support
 +  * Wish branches
 +    * Compiler generate both predicated and non-predicated codes
 +    * HW design which one to use
 +      * Use branch prediction on an easy to predict code
 +      * Use predicated execution on a hard to predict code
 +      * Compiler can be more aggressive in optimimzing the code
 +    * What are the tradeoffs (slide# 47)
 +  * Multi-path execution
 +    * Execute both paths
 +    * Can lead to wasted work
 +    * VLIW
 +    * SuperScalar
 +
 +
 +===== Lecture 11 (2/11 Wed.) =====
 +
 +  * Geometric GHR length for branch prediction
 +  * Perceptron branch predictor
 +  * Multi-cycle executions (Different functional units take different number of cycles)
 +    * Instructions can retire out-of-order
 +      * How to deal with this case? Stall? Throw exceptions if there are problems?
 +  * Exceptions and Interrupts
 +    * When they are handled? ​
 +    * Why are some interrupts should be handled right away?
 +  * Precise exception
 +    * arch. state should be consistent before handling the exception/​interrupts
 +      * Easier to debug (you see the sequential flow when the interrupt occurs)
 +        * Deterministic
 +      * Easier to recover from the exception
 +      * Easier to restart the processes
 +    * How to ensure precise exception?
 +    * Tradeoffs between each method
 +  * Reorder buffer
 +    * Reorder results before they are visible to the arch. state
 +      * Need to presearve the sequential sematic and data
 +    * What are the informatinos in the ROB entry
 +    * Where to get the value from (forwarding path? reorder buffer?)
 +      * Extra logic to check where the youngest instructions/​value is
 +      * Content addressible search (CAM)
 +        * A lot of comparators
 +    * Different ways to simplify the reorder buffer
 +    * Register renaming
 +      * Same register refers to independent values (lacks of registers)
 +    * Where does the exception happen (after retire)
 +  * History buffer
 +    * Update the register file when the instruction complete. Unroll if there is an exception.
 +  * Future file (commonly used, along with reorder buffer)
 +    * Keep two set of register files
 +      * An updated value (Speculative),​ called future file
 +      * A backup value (to restore the state quickly
 +    * Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone)
 +  * Branch misprediction resembles Exception
 +    * The difference is that branch misprediction is not visible to the software
 +      * Also much more common (say, divide by zero vs. a mispredicted branch)
 +    * Recovery is similar to exception handling
 +  * Latency of the state recovery
 +  * What to do during the state recovery
 +  * Checkpointing
 +    * Advantages?
 +
 +
buzzword.txt ยท Last modified: 2015/04/27 18:20 by rachata