This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2015/02/02 19:23] rachata |
buzzword [2015/02/06 19:21] albert lecture 10 words added |
||
---|---|---|---|
Line 277: | Line 277: | ||
* Pipeline flush | * Pipeline flush | ||
* Speculation | * Speculation | ||
- | |||
- | | ||
| | ||
===== Lecture 8 (2/2 Mon.) ===== | ===== Lecture 8 (2/2 Mon.) ===== | ||
Line 319: | Line 317: | ||
| | ||
| | ||
+ | ===== Lecture 9 (2/4 Wed.) ===== | ||
+ | * Definition of basic blocks | ||
+ | * Control flow graph | ||
+ | * Delayed branching | ||
+ | * benefit? | ||
+ | * What does it eliminates? | ||
+ | * downside? | ||
+ | * Delayed branching in SPARC (with squashing) | ||
+ | * Backward compatibility with the delayed slot | ||
+ | * What should be filled in the delayed slot | ||
+ | * How to ensure correctness | ||
+ | * Fine-grained multithreading | ||
+ | * fetch from different threads | ||
+ | * What are the issues (what if the program doesn't have many threads) | ||
+ | * CDC 6000 | ||
+ | * Denelcor HEP | ||
+ | * No dependency checking | ||
+ | * Inst. from different thread can fill-in the bubbles | ||
+ | * Cost? | ||
+ | * Simulteneuos multithreading | ||
+ | * Branch prediction | ||
+ | * Guess what to fetch next. | ||
+ | * Misprediction penalty | ||
+ | * Need to guess the direction and target | ||
+ | * How to perform the performance analysis? | ||
+ | * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction. | ||
+ | * Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions. | ||
+ | * How many extra instructions are being fetched? | ||
+ | * What is the performance degredation? | ||
+ | * How to reduce the miss penalty? | ||
+ | * Predicting the next address (non PC+4 address) | ||
+ | * Branch target buffer (BTB) | ||
+ | * Predicting the address of the branch | ||
+ | * Global branch history - for directions | ||
+ | * Can use compiler to profile and get more info | ||
+ | * Input set dictacts the accuracy | ||
+ | * Add time to compilation | ||
+ | * Heuristics that are common and doesn't require profiling. | ||
+ | * Might be inaccurate | ||
+ | * Does not require profiling | ||
+ | * Static branch prediction | ||
+ | * Pregrammer provides pragmas, hinting the likelihood of taken/not taken branch | ||
+ | * For example, x86 has the hint bit | ||
+ | * Dynamic branch prediction | ||
+ | * Last time predictor | ||
+ | * Two bits counter based prediction | ||
+ | * One more bit for hysteresis | ||
+ | |||
+ | ===== Lecture 10 (2/6 Fri.) ===== | ||
+ | * Branch prediction accuracy | ||
+ | * Why are they very important? | ||
+ | * Differences between 99% accuracy and 98% accuracy | ||
+ | * Cost of a misprediction when the pipeline is veryd eep | ||
+ | * Global branch correlation | ||
+ | * Some branches are correlated | ||
+ | * Local branch correlation | ||
+ | * Some branches can depend on the result of past branches | ||
+ | * Pattern history table | ||
+ | * Record global taken/not taken results. | ||
+ | * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?) | ||
+ | * One-level branch predictor | ||
+ | * What information are used | ||
+ | * Two-level branch prediction | ||
+ | * What entries do you keep in the global history? | ||
+ | * What entries do you keep in the local history? | ||
+ | * How many table? | ||
+ | * Cost when training a table | ||
+ | * What are the purposes of each table? | ||
+ | * Potential problems of a two-level history | ||
+ | * GShare predictor | ||
+ | * Global history predictor is hashed with the PC | ||
+ | * Store both GHP and PC in one combined information | ||
+ | * How do you use the information? Why does the XOR result still usable? | ||
+ | * Warmup cost of the branch predictor | ||
+ | * Hybrid solution? Fast warmup is used first, then switch to the slower one. | ||
+ | * Tournament predictor (Alpha 21264) | ||
+ | * Predicated execution - eliminate branches | ||
+ | * What are the tradeoffs | ||
+ | * What if the block is big (can lead to execution a lot of useless work) | ||
+ | * Allows easier code optimization | ||
+ | * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block | ||
+ | * Reduce control dependences | ||
+ | * Need ISA support | ||
+ | * Wish branches | ||
+ | * Compiler generate both predicated and non-predicated codes | ||
+ | * HW design which one to use | ||
+ | * Use branch prediction on an easy to predict code | ||
+ | * Use predicated execution on a hard to predict code | ||
+ | * Compiler can be more aggressive in optimimzing the code | ||
+ | * What are the tradeoffs (slide# 47) | ||
+ | * Multi-path execution | ||
+ | * Execute both paths | ||
+ | * Can lead to wasted work | ||
+ | * VLIW | ||
+ | * SuperScalar | ||
+ |