Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/02/02 19:23]
rachata
+++ buzzword [2015/02/16 19:16]
rachata
@@ Line 277: / Line 277: @@
     * Pipeline flush
     * Speculation
 ===== Lecture 8 (2/2 Mon.) =====
@@ Line 319: / Line 317: @@
+===== Lecture 9 (2/4 Wed.) =====
+  * Definition of basic blocks
+  * Control flow graph
+  * Delayed branching
+    * benefit?
+    * What does it eliminates?
+    * downside?
+    * Delayed branching in SPARC (with squashing)
+    * Backward compatibility with the delayed slot
+    * What should be filled in the delayed slot
+    * How to ensure correctness
+  * Fine-grained multithreading
+    * fetch from different threads
+    * What are the issues (what if the program doesn't have many threads)
+    * CDC 6000
+    * Denelcor HEP
+    * No dependency checking
+    * Inst. from different thread can fill-in the bubbles
+    * Cost?
+  * Simulteneuos multithreading
+  * Branch prediction
+    * Guess what to fetch next.
+    * Misprediction penalty
+    * Need to guess the direction and target
+    * How to perform the performance analysis?
+      * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction.
+      * Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
+        * How many extra instructions are being fetched?
+        * What is the performance degredation?
+    * How to reduce the miss penalty?
+    * Predicting the next address (non PC+4 address)
+    * Branch target buffer (BTB)
+      * Predicting the address of the branch
+    * Global branch history - for directions
+    * Can use compiler to profile and get more info
+      * Input set dictacts the accuracy
+      * Add time to compilation
+    * Heuristics that are common and doesn't require profiling.
+      * Might be inaccurate
+      * Does not require profiling
+    * Static branch prediction
+      * Pregrammer provides pragmas, hinting the likelihood of taken/not taken branch
+      * For example, x86 has the hint bit
+    * Dynamic branch prediction
+      * Last time predictor
+      * Two bits counter based prediction
+        * One more bit for hysteresis
+===== Lecture 10 (2/6 Fri.) =====
+  * Branch prediction accuracy
+    * Why are they very important?
+      * Differences between 99% accuracy and 98% accuracy
+      * Cost of a misprediction when the pipeline is veryd eep
+  * Global branch correlation
+    * Some branches are correlated
+  * Local branch correlation
+    * Some branches can depend on the result of past branches
+  * Pattern history table
+    * Record global taken/not taken results.
+    * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?)
+  * One-level branch predictor
+    * What information are used
+  * Two-level branch prediction
+    * What entries do you keep in the global history?
+    * What entries do you keep in the local history?
+    * How many table?
+    * Cost when training a table
+    * What are the purposes of each table?
+    * Potential problems of a two-level history
+  * GShare predictor
+    * Global history predictor is hashed with the PC
+    * Store both GHP and PC in one combined information
+    * How do you use the information? Why does the XOR result still usable?
+  * Warmup cost of the branch predictor
+    * Hybrid solution? Fast warmup is used first, then switch to the slower one.
+  * Tournament predictor (Alpha 21264)
+  * Predicated execution - eliminate branches
+    * What are the tradeoffs
+    * What if the block is big (can lead to execution a lot of useless work)
+    * Allows easier code optimization
+      * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block
+      * Reduce control dependences
+    * Need ISA support
+  * Wish branches
+    * Compiler generate both predicated and non-predicated codes
+    * HW design which one to use
+      * Use branch prediction on an easy to predict code
+      * Use predicated execution on a hard to predict code
+      * Compiler can be more aggressive in optimimzing the code
+    * What are the tradeoffs (slide# 47)
+  * Multi-path execution
+    * Execute both paths
+    * Can lead to wasted work
+    * VLIW
+    * SuperScalar
+===== Lecture 11 (2/11 Wed.) =====
+  * Geometric GHR length for branch prediction
+  * Perceptron branch predictor
+  * Multi-cycle executions (Different functional units take different number of cycles)
+    * Instructions can retire out-of-order
+      * How to deal with this case? Stall? Throw exceptions if there are problems?
+  * Exceptions and Interrupts
+    * When they are handled?
+    * Why are some interrupts should be handled right away?
+  * Precise exception
+    * arch. state should be consistent before handling the exception/interrupts
+      * Easier to debug (you see the sequential flow when the interrupt occurs)
+        * Deterministic
+      * Easier to recover from the exception
+      * Easier to restart the processes
+    * How to ensure precise exception?
+    * Tradeoffs between each method
+  * Reorder buffer
+    * Reorder results before they are visible to the arch. state
+      * Need to presearve the sequential sematic and data
+    * What are the informatinos in the ROB entry
+    * Where to get the value from (forwarding path? reorder buffer?)
+      * Extra logic to check where the youngest instructions/value is
+      * Content addressible search (CAM)
+        * A lot of comparators
+    * Different ways to simplify the reorder buffer
+    * Register renaming
+      * Same register refers to independent values (lacks of registers)
+    * Where does the exception happen (after retire)
+  * History buffer
+    * Update the register file when the instruction complete. Unroll if there is an exception.
+  * Future file (commonly used, along with reorder buffer)
+    * Keep two set of register files
+      * An updated value (Speculative), called future file
+      * A backup value (to restore the state quickly
+    * Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone)
+  * Branch misprediction resembles Exception
+    * The difference is that branch misprediction is not visible to the software
+      * Also much more common (say, divide by zero vs. a mispredicted branch)
+    * Recovery is similar to exception handling
+  * Latency of the state recovery
+  * What to do during the state recovery
+  * Checkpointing
+    * Advantages?
+===== Lecture 12 (2/13 Fri.) =====
+  * Renaming
+  * Register renaming table
+  * Predictor (branch predictor, cache line predictor ...)
+  * Power budget (and its importance)
+  * Architectural state, precise state
+  * Memory dependence is known dynamically
+  * Register state is not shared across threads/processors
+  * Memory state is shared across threads/processors
+  * How to maintain speculative memory states
+  * Write buffers (helps simplify the process of checking the reorder buffer)
+  * Overall OoO mechanism
+    * What are other ways of eliminating dispatch stalls
+    * Dispatch when the sources are ready
+    * Retired instructions make the source available
+    * Register renaming
+    * Reservation station
+      * What goes into the reservation station
+      * Tags required in the reservation station
+    * Tomasulo's algorithm
+    * Without precise exception, OoO is hard to debug
+    * Arch. register ID
+    * Examples in the slides
+      * Slides 28 --> register renaming
+      * Slides 30-35 --> Exercise (also on the board)
+        * This will be usefull for the midterm
+    * Register aliasing table
+    * Broadcasting tags
+    * Using dataflow
+===== Lecture 13 (2/16 Mon.) =====
+  * OoO --> Restricted Dataflow
+    * Extracting parallelism
+    * What are the bottlenecks?
+      * Issue width
+      * Dispatch width
+      * Parallelism in the program
+    * What does it mean to be restricted data flow
+      * Still visible as a Von Neumann model
+    * Where does the efficiency come from?
+    * Size of the scheduling windors/reorder buffer. Tradeoffs? What make sense?
+  * Load/store handling
+    * Would like to schedule them out of order, but make them visible in-order
+    * When do you schedule the load/store instructions?
+    * Can we predict if load/store are dependent?
+    * This is one of the most complex structure of the load/store handling
+    * What information can be used to predict these load/store optimization?
+  * Centralized vs. distributed? What are the tradeoffs?
+  * How to handle when there is a misprediction/recovery
+    * OoO + branch prediction?
+    * Speculatively update the history register
+      * When do you update the GHR?
+  * Token dataflow arch.
+    * What are tokens?
+    * How to match tokens
+    * Tagged token dataflow arch.
+    * What are the tradeoffs?
+    * Difficulties?

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools