This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2015/02/04 19:23] rachata |
buzzword [2015/02/11 19:20] rachata |
||
---|---|---|---|
Line 318: | Line 318: | ||
| | ||
===== Lecture 9 (2/4 Wed.) ===== | ===== Lecture 9 (2/4 Wed.) ===== | ||
- | | ||
- | * Predicate combining (combine predicate for a branch instruction) | ||
- | * Predicated execution (control dependence becomes data dependence) | ||
* Definition of basic blocks | * Definition of basic blocks | ||
* Control flow graph | * Control flow graph | ||
Line 367: | Line 364: | ||
* Two bits counter based prediction | * Two bits counter based prediction | ||
* One more bit for hysteresis | * One more bit for hysteresis | ||
+ | |||
+ | ===== Lecture 10 (2/6 Fri.) ===== | ||
+ | * Branch prediction accuracy | ||
+ | * Why are they very important? | ||
+ | * Differences between 99% accuracy and 98% accuracy | ||
+ | * Cost of a misprediction when the pipeline is veryd eep | ||
+ | * Global branch correlation | ||
+ | * Some branches are correlated | ||
+ | * Local branch correlation | ||
+ | * Some branches can depend on the result of past branches | ||
+ | * Pattern history table | ||
+ | * Record global taken/not taken results. | ||
+ | * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?) | ||
+ | * One-level branch predictor | ||
+ | * What information are used | ||
+ | * Two-level branch prediction | ||
+ | * What entries do you keep in the global history? | ||
+ | * What entries do you keep in the local history? | ||
+ | * How many table? | ||
+ | * Cost when training a table | ||
+ | * What are the purposes of each table? | ||
+ | * Potential problems of a two-level history | ||
+ | * GShare predictor | ||
+ | * Global history predictor is hashed with the PC | ||
+ | * Store both GHP and PC in one combined information | ||
+ | * How do you use the information? Why does the XOR result still usable? | ||
+ | * Warmup cost of the branch predictor | ||
+ | * Hybrid solution? Fast warmup is used first, then switch to the slower one. | ||
+ | * Tournament predictor (Alpha 21264) | ||
+ | * Predicated execution - eliminate branches | ||
+ | * What are the tradeoffs | ||
+ | * What if the block is big (can lead to execution a lot of useless work) | ||
+ | * Allows easier code optimization | ||
+ | * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block | ||
+ | * Reduce control dependences | ||
+ | * Need ISA support | ||
+ | * Wish branches | ||
+ | * Compiler generate both predicated and non-predicated codes | ||
+ | * HW design which one to use | ||
+ | * Use branch prediction on an easy to predict code | ||
+ | * Use predicated execution on a hard to predict code | ||
+ | * Compiler can be more aggressive in optimimzing the code | ||
+ | * What are the tradeoffs (slide# 47) | ||
+ | * Multi-path execution | ||
+ | * Execute both paths | ||
+ | * Can lead to wasted work | ||
+ | * VLIW | ||
+ | * SuperScalar | ||
+ | |||
+ | |||
+ | ===== Lecture 11 (2/11 Mon.) ===== | ||
+ | |||
+ | * Geometric GHR length for branch prediction | ||
+ | * Perceptron branch predictor | ||
+ | * Multi-cycle executions (Different functional units take different number of cycles) | ||
+ | * Instructions can retire out-of-order | ||
+ | * How to deal with this case? Stall? Throw exceptions if there are problems? | ||
+ | * Exceptions and Interrupts | ||
+ | * When they are handled? | ||
+ | * Why are some interrupts should be handled right away? | ||
+ | * Precise exception | ||
+ | * arch. state should be consistent before handling the exception/interrupts | ||
+ | * Easier to debug (you see the sequential flow when the interrupt occurs) | ||
+ | * Deterministic | ||
+ | * Easier to recover from the exception | ||
+ | * Easier to restart the processes | ||
+ | * How to ensure precise exception? | ||
+ | * Tradeoffs between each method | ||
+ | * Reorder buffer | ||
+ | * Reorder results before they are visible to the arch. state | ||
+ | * Need to presearve the sequential sematic and data | ||
+ | * What are the informatinos in the ROB entry | ||
+ | * Where to get the value from (forwarding path? reorder buffer?) | ||
+ | * Extra logic to check where the youngest instructions/value is | ||
+ | * Content addressible search | ||
+ | * A lot of comparators | ||
+ | * Different ways to simplify the reorder buffer | ||
+ | * Register renaming | ||
+ | * Same register refers to independent values (lacks of registers) | ||
+ | * Where does the exception happen (after retire) | ||
+ | * History buffer | ||
+ | * Update the register file when the instruction complete. Unroll if there is an exception. | ||
+ | * Future file (commonly used, along with reorder buffer) | ||
+ | * Keep two set of register files | ||
+ | * An updated value (Speculative), called fiture file | ||
+ | * A backup value (to restore the state quickly | ||
+ | * Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone) | ||
+ | * Branch misprediction resembles Exception | ||
+ | * The difference is that branch misprediction is not visible to the software | ||
+ | * Also much more common (say, divide by zero vs. a mispredicted branch) | ||
+ | * Recovery is similar to exception handling | ||
+ | * Latency of the state recovery | ||
+ | * What to do during the state recovery | ||
+ | * Checkpointing | ||
+ | * Advantages? | ||
+ | |||