This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2014/01/27 19:20] rachata |
buzzword [2014/02/03 19:20] rachata |
||
---|---|---|---|
Line 273: | Line 273: | ||
* Vertical microcode | * Vertical microcode | ||
* Primitives | * Primitives | ||
+ | |||
+ | ===== Lecture 7 (1/29 Wed.) ===== | ||
+ | |||
+ | * Pipelining | ||
+ | * Limitations of the multi-programmed design | ||
+ | * Idle resources | ||
+ | * Throughput of a pipelined design | ||
+ | * What dictacts the throughput of a pipelined design? | ||
+ | * Latency of the pipelined design | ||
+ | * Dependency | ||
+ | * Overhead of pipelining | ||
+ | * Latch cost? | ||
+ | * Data forwarding/bypassing | ||
+ | * What are the ideal pipeline? | ||
+ | * External fragmentation | ||
+ | * Issues in pipeline designs | ||
+ | * Stalling | ||
+ | * Dependency (Hazard) | ||
+ | * Flow dependence | ||
+ | * Output dependence | ||
+ | * Anti dependence | ||
+ | * How to handle them? | ||
+ | * Resource contention | ||
+ | * Keeping the pipeline full | ||
+ | * Handling exception/interrupts | ||
+ | * Pipeline flush | ||
+ | * Speculation | ||
+ | * Interlocking | ||
+ | * Multipath execution | ||
+ | * Fine grain multithreading | ||
+ | * No-op (Bubbles in the pipeline) | ||
+ | * Valid bits in the instructions | ||
+ | |||
+ | ===== Lecture 8 (1/31 Fri.) ===== | ||
+ | * Branch prediction | ||
+ | * Different types of data dependence | ||
+ | * Pipeline stalls | ||
+ | * bubbles | ||
+ | * How to handle stalls | ||
+ | * Stall conditions | ||
+ | * Stall signals | ||
+ | * Dependences | ||
+ | * Distant between dependences | ||
+ | * Data forwarding/bypassing | ||
+ | * Maintaining the correct dataflow | ||
+ | * Different ways to design data forwarding path/logic | ||
+ | * Different techniques to handle interlockings | ||
+ | * SW based | ||
+ | * HW based | ||
+ | * Profiling | ||
+ | * Static profiling | ||
+ | * Helps from the software (compiler) | ||
+ | * Superblock optimization | ||
+ | * Analyzing basic blocks | ||
+ | * How to deal with branches? | ||
+ | * Branch prediction | ||
+ | * Delayed branching (branch delay slot) | ||
+ | * Forward control flow/backward control flow | ||
+ | * Branch prediction accuracy | ||
+ | * Profile guided code positioning | ||
+ | * Based on the profile info. position the code based on it | ||
+ | * Try to make the next sequential instruction be the next inst. to be executed | ||
+ | * Trace cache | ||
+ | * Predicate combining (combine predicate for a branch instruction) | ||
+ | * Predicated execution (control dependence becomes data dependence) | ||
+ | * Definition of basic blocks | ||
+ | * Control flow graph | ||
+ | |||
+ | ===== Lecture 9 (2/3 Mon.) ===== | ||
+ | * Delayed branching | ||
+ | * benefit? | ||
+ | * What does it eliminates? | ||
+ | * downside? | ||
+ | * Delayed branching in SPARC (with squashing) | ||
+ | * Backward compatibility with the delayed slot | ||
+ | * What should be filled in the delayed slot | ||
+ | * How to ensure correctness | ||
+ | * Fine-grained multithreading | ||
+ | * fetch from different threads | ||
+ | * What are the issues (what if the program doesn't have many threads) | ||
+ | * CDC 6000 | ||
+ | * Denelcor HEP | ||
+ | * No dependency checking | ||
+ | * Inst. from different thread can fill-in the bubbles | ||
+ | * Cost? | ||
+ | * Simulteneuos multithreading | ||
+ | * Branch prediction | ||
+ | * Guess what to fetch next. | ||
+ | * Misprediction penalty | ||
+ | * Need to guess the direction and target | ||
+ | * How to perform the performance analysis? | ||
+ | * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction. | ||
+ | * Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions. | ||
+ | * How many extra instructions are being fetched? | ||
+ | * What is the performance degredation? | ||
+ | * How to reduce the miss penalty? | ||
+ | * Predicting the next address (non PC+4 address) | ||
+ | * Branch target buffer (BTB) | ||
+ | * Predicting the address of the branch | ||
+ | * Global branch history - for directions | ||
+ | * Can use compiler to profile and get more info | ||
+ | * Input set dictacts the accuracy | ||
+ | * Add time to compilation | ||
+ | * Heuristics that are common and doesn't require profiling. | ||
+ | * Might be inaccurate | ||
+ | * Does not require profiling | ||
+ | * Programmer can tell the hardware (via pragmas (hints)) | ||
+ | * For example, x86 has the hint bit | ||
+ | * Dynamic branch prediction | ||
+ | * Last time predictor | ||
+ | * Two bits counter based prediction | ||
+ | * One more bit for hysteresis | ||
+ | |||