Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/01/10 21:03]
kevincha [Lecture 1 (1/13 Mon.)]
+++ buzzword [2015/02/12 19:01]
kevincha [Lecture 11 (2/11 Mon.)]
@@ Line 4: / Line 4: @@
 ===== Lecture 1 (1/12 Mon.) =====
+  * Level of transformation
+    * Algorithm
+    * System software
+    * Compiler
+  * Cross abstraction layers
+  * Tradeoffs
+  * Caches
+  * DRAM/memory controller
+  * DRAM banks
+  * Row buffer hit/miss
+  * Row buffer locality
+  * Unfairness
+  * Memory performance hog
+  * Shared DRAM memory system
+  * Streaming access vs. random access
+  * Memory scheduling policies
+  * Scheduling priority
+  * Retention time of DRAM
+  * Process variation
+  * Retention time profile
+  * Power consumption
+  * Bloom filter
+  * Hamming code
+  * Hamming distance
+  * DRAM row hammer
+===== Lecture 2 (1/14 Wed.) =====
+  * Moore's Law
+  * Algorithm --> step-by-step procedure to solve a problem
+  * in-order execution
+  * out-of-order execution
+  * technologies that are available on cellphones
+  * new applications that are made available through new computer architecture techniques
+    * more data mining (genomics/medical areas)
+	* lower power (cellphones)
+	* smaller cores (cellphones/computers)
+	* etc.
+  * Performance bottlenecks in a single thread/core processors
+    * multi-core as an alternative
+  * Memory wall (a part of scaling issue)
+  * Scaling issue
+    * Transister are getting smaller
+  * Key components of a computer
+  * Design points
+    * Design processors to meet the design points
+  * Software stack
+  * Design decisions
+  * Datacenters
+  * Reliability problems that cause errors
+  * Analogies from Kuhn's "The Structure of Scientific Revolutions" (Recommended book)
+    * Pre paradigm science
+    * Normal science
+    * Revolutionalry science
+  * Components of a computer
+    * Computation
+      * Communication
+      * Storage
+        * DRAM
+        * NVRAM (Non-volatile memory): PCM, STT-MRAM
+        * Storage (Flash/Harddrive)
+  * Von Neumann Model (Control flow model)
+    * Stored program computer
+      * Properties of Von Neumann Model: Stored program, sequential instruction processing
+      * Unified memory
+        * When does an instruction is being interpreted as an instruction (as oppose to a datum)?
+      * Program counter
+      * Examples: x86, ARM, Alpha, IBM Power series, SPARC, MIPS
+  * Data flow model
+    * Data flow machine
+      * Data flow graph
+    * Operands
+    * Live-outs/Live-ins
+      * DIfferent types of data flow nodes (conditional/relational/barrier)
+    * How to do transactional transaction in dataflow?
+      * Example: bank transactions
+  * Tradeoffs between control-driven and data-driven
+    * What are easier to program?
+      * Which are easy to compile?
+      * What are more parallel (does that mean it is faster?)
+      * Which machines are more complex to design?
+    * In control flow, when a program is stop, there is a pointer to the current state (precise state).
+  * ISA vs. Microarchitecture
+    * Semantics in the ISA
+      * uArch should obey the ISA
+      * Changing ISA is costly, can affect compatibility.
+  * Instruction pointers
+  * uArch techniques: common and powerful techniques break Vonn Neumann model if done at the ISA level
+    * Conceptual techniques
+      * Pipelining
+      * Multiple instructions at a time
+      * Out-of-order executions
+      * etc.
+        * Design techniques
+          * Adder implementation (Bit serial, ripple carry, carry lookahead)
+          * Connection machine (an example of a machine that use bit serial to tradeoff latency for more parallelism)
+  * Microprocessor: ISA + uArch + circuits
+  * What are a part of the ISA? Instructions, memory, etc.
+    * Things that are visible to the programmer/software
+  * What are not a part of the ISA? (what goes inside: uArch techniques)
+    * Things that are not suppose to be visible to the programmer/software but typically make the processor faster and/or consumes less power and/or less complex
+===== Lecture 3 (1/17 Fri.) =====
+   * Microarchitecture
+   * Three major tradeoffs of computer architecture
+   * Macro-architecture
+   * LC-3b ISA
+   * Unused instructions
+   * Bit steering
+   * Instruction processing style
+   * 0,1,2,3 address machines
+   * Stack machine
+   * Accumulator machine
+   * 2-operand machine
+   * 3-operand machine
+   * Tradeoffs between 0,1,2,3 address machines
+   * Postfix notation
+   * Instructions/Opcode/Operade specifiers (i.e. addressing modes)
+   * Simply vs. complex data type (and their tradeoffs)
+   * Semantic gap and level
+   * Translation layer
+   * Addressability
+   * Byte/bit addressable machines
+   * Virtual memory
+   * Big/little endian
+   * Benefits of having registers (data locality)
+   * Programmer visible (Architectural) state
+   * Programmers can access this directly
+   * What are the benefits?
+   * Microarchitectural state
+   * Programmers cannot access this directly
+   * Evolution of registers (from accumulators to registers)
+   * Different types of instructions
+   * Control instructions
+   * Data instructions
+   * Operation instructions
+   * Addressing modes
+   * Tradeoffs (complexity, flexibility, etc.)
+   * Orthogonal ISA
+   * Addressing modes that are orthogonal to instruction types
+   * I/O devices
+   * Vectored vs. non-vectored interrupts
+   * Complex vs. simple instructions
+   * Tradeoffs
+   * RISC vs. CISC
+   * Tradeoff
+   * Backward compatibility
+   * Performance
+   * Optimization opportunity
+   * Translation
+===== Lecture 4 (1/21 Wed.) =====
+  * Fixed vs. variable length instruction
+  * Huffman encoding
+  * Uniform vs. non-uniform decode
+  * Registers
+    * Tradeoffs between number of registers
+  * Alignments
+    * How does MIPS load words across alignment the boundary
+===== Lecture 5 (1/26 Mon.) =====
+  * Tradeoffs in ISA: Instruction length
+    * Uniform vs. non-uniform
+  * Design point/Use cases
+    * What dictates the design point?
+  * Architectural states
+  * uArch
+    * How to implement the ISA in the uArch
+  * Different stages in the uArch
+  * Clock cycles
+  * Multi-cycle machine
+  * Datapath and control logic
+    * Control signals
+  * Execution time of instructions/program
+    * Metrics and what do they means
+  * Instruction processing
+    * Fetch
+    * Decode
+    * Execute
+    * Memory fetch
+    * Writeback
+  * Encoding and semantics
+  * Different types of instructions (I-type, R-type, etc.)
+  * Control flow instructions
+  * Non-control flow instructions
+  * Delayed slot/Delayed branch
+  * Single cycle control logic
+  * Lockstep
+  * Critical path analysis
+    * Critical path of a single cycle processor
+  * What is in the control signals?
+    * Combinational logic & Sequential logic
+  * Control store
+  * Tradeoffs of a single cycle uarch
+  * Design principles
+    * Common case design
+    * Critical path design
+    * Balanced designs
+    * Dynamic power/Static power
+      * Increases in power due to frequency
+===== Lecture 6 (1/28 Mon.) =====
+  * Design principles
+    * Common case design
+    * Critical path design
+    * Balanced designs
+  * Multi cycle design
+  * Microcoded/Microprogrammed machines
+    * States
+    * Translation from one state to another
+    * Microinstructions
+    * Microsequencing
+    * Control store - Product control signals
+    * Microsequencer
+    * Control signal
+      * What do they have to control?
+  * Instruction processing cycle
+  * Latch signals
+  * State machine
+  * State variables
+  * Condition code
+  * Steering bits
+  * Branch enable logic
+  * Difference between gating and loading? (write enable vs. driving the bus)
+  * Memory mapped I/O
+  * Hardwired logic
+    * What control signals come from hardwired logic?
+  * Variable latency memory
+  * Handling interrupts
+  * Difference betwen interrupts and exceptions
+  * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
+  * Updating machine behavior
+  * Horizontal microcode
+  * Vertical microcode
+  * Primitives
+===== Lecture 7 (1/30 Fri.) =====
+  * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
+  * Updating machine behavior
+  * Horizontal microcode
+  * Vertical microcode
+  * Primitives
+  * nanocode and millicode
+    * what are the differences between nano/milli/microcode
+  * microprogrammed vs. hardwire control
+  * Pipelining
+  * Limitations of the multi-programmed design
+    * Idle resources
+  * Throughput of a pipelined design
+    * What dictacts the throughput of a pipelined design?
+  * Latency of the pipelined design
+  * Dependency
+  * Overhead of pipelining
+    * Latch cost?
+  * Data forwarding/bypassing
+  * What are the ideal pipeline?
+  * External fragmentation
+  * Issues in pipeline designs
+    * Stalling
+      * Dependency (Hazard)
+        * Flow dependence
+        * Output dependence
+        * Anti dependence
+        * How to handle them?
+    * Resource contention
+    * Keeping the pipeline full
+    * Handling exception/interrupts
+    * Pipeline flush
+    * Speculation
+===== Lecture 8 (2/2 Mon.) =====
+  * Interlocking
+  * Multipath execution
+  * Fine grain multithreading
+  * No-op (Bubbles in the pipeline)
+  * Valid bits in the instructions
+  * Branch prediction
+  * Different types of data dependence
+  * Pipeline stalls
+    * bubbles
+    * How to handle stalls
+    * Stall conditions
+    * Stall signals
+    * Dependences
+      * Distant between dependences
+    * Data forwarding/bypassing
+    * Maintaining the correct dataflow
+  * Different ways to design data forwarding path/logic
+  * Different techniques to handle interlockings
+    * SW based
+    * HW based
+  * Profiling
+    * Static profiling
+    * Helps from the software (compiler)
+      * Superblock optimization
+      * Analyzing basic blocks
+  * How to deal with branches?
+    * Branch prediction
+    * Delayed branching (branch delay slot)
+    * Forward control flow/backward control flow
+    * Branch prediction accuracy
+  * Profile guided code positioning
+    * Based on the profile info. position the code based on it
+    * Try to make the next sequential instruction be the next inst. to be executed
+  * Predicate combining (combine predicate for a branch instruction)
+  * Predicated execution (control dependence becomes data dependence)
+===== Lecture 9 (2/4 Wed.) =====
+  * Definition of basic blocks
+  * Control flow graph
+  * Delayed branching
+    * benefit?
+    * What does it eliminates?
+    * downside?
+    * Delayed branching in SPARC (with squashing)
+    * Backward compatibility with the delayed slot
+    * What should be filled in the delayed slot
+    * How to ensure correctness
+  * Fine-grained multithreading
+    * fetch from different threads
+    * What are the issues (what if the program doesn't have many threads)
+    * CDC 6000
+    * Denelcor HEP
+    * No dependency checking
+    * Inst. from different thread can fill-in the bubbles
+    * Cost?
+  * Simulteneuos multithreading
+  * Branch prediction
+    * Guess what to fetch next.
+    * Misprediction penalty
+    * Need to guess the direction and target
+    * How to perform the performance analysis?
+      * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction.
+      * Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
+        * How many extra instructions are being fetched?
+        * What is the performance degredation?
+    * How to reduce the miss penalty?
+    * Predicting the next address (non PC+4 address)
+    * Branch target buffer (BTB)
+      * Predicting the address of the branch
+    * Global branch history - for directions
+    * Can use compiler to profile and get more info
+      * Input set dictacts the accuracy
+      * Add time to compilation
+    * Heuristics that are common and doesn't require profiling.
+      * Might be inaccurate
+      * Does not require profiling
+    * Static branch prediction
+      * Pregrammer provides pragmas, hinting the likelihood of taken/not taken branch
+      * For example, x86 has the hint bit
+    * Dynamic branch prediction
+      * Last time predictor
+      * Two bits counter based prediction
+        * One more bit for hysteresis
+===== Lecture 10 (2/6 Fri.) =====
+  * Branch prediction accuracy
+    * Why are they very important?
+      * Differences between 99% accuracy and 98% accuracy
+      * Cost of a misprediction when the pipeline is veryd eep
+  * Global branch correlation
+    * Some branches are correlated
+  * Local branch correlation
+    * Some branches can depend on the result of past branches
+  * Pattern history table
+    * Record global taken/not taken results.
+    * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?)
+  * One-level branch predictor
+    * What information are used
+  * Two-level branch prediction
+    * What entries do you keep in the global history?
+    * What entries do you keep in the local history?
+    * How many table?
+    * Cost when training a table
+    * What are the purposes of each table?
+    * Potential problems of a two-level history
+  * GShare predictor
+    * Global history predictor is hashed with the PC
+    * Store both GHP and PC in one combined information
+    * How do you use the information? Why does the XOR result still usable?
+  * Warmup cost of the branch predictor
+    * Hybrid solution? Fast warmup is used first, then switch to the slower one.
+  * Tournament predictor (Alpha 21264)
+  * Predicated execution - eliminate branches
+    * What are the tradeoffs
+    * What if the block is big (can lead to execution a lot of useless work)
+    * Allows easier code optimization
+      * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block
+      * Reduce control dependences
+    * Need ISA support
+  * Wish branches
+    * Compiler generate both predicated and non-predicated codes
+    * HW design which one to use
+      * Use branch prediction on an easy to predict code
+      * Use predicated execution on a hard to predict code
+      * Compiler can be more aggressive in optimimzing the code
+    * What are the tradeoffs (slide# 47)
+  * Multi-path execution
+    * Execute both paths
+    * Can lead to wasted work
+    * VLIW
+    * SuperScalar
+===== Lecture 11 (2/11 Mon.) =====
+  * Geometric GHR length for branch prediction
+  * Perceptron branch predictor
+  * Multi-cycle executions (Different functional units take different number of cycles)
+    * Instructions can retire out-of-order
+      * How to deal with this case? Stall? Throw exceptions if there are problems?
+  * Exceptions and Interrupts
+    * When they are handled?
+    * Why are some interrupts should be handled right away?
+  * Precise exception
+    * arch. state should be consistent before handling the exception/interrupts
+      * Easier to debug (you see the sequential flow when the interrupt occurs)
+        * Deterministic
+      * Easier to recover from the exception
+      * Easier to restart the processes
+    * How to ensure precise exception?
+    * Tradeoffs between each method
+  * Reorder buffer
+    * Reorder results before they are visible to the arch. state
+      * Need to presearve the sequential sematic and data
+    * What are the informatinos in the ROB entry
+    * Where to get the value from (forwarding path? reorder buffer?)
+      * Extra logic to check where the youngest instructions/value is
+      * Content addressible search (CAM)
+        * A lot of comparators
+    * Different ways to simplify the reorder buffer
+    * Register renaming
+      * Same register refers to independent values (lacks of registers)
+    * Where does the exception happen (after retire)
+  * History buffer
+    * Update the register file when the instruction complete. Unroll if there is an exception.
+  * Future file (commonly used, along with reorder buffer)
+    * Keep two set of register files
+      * An updated value (Speculative), called future file
+      * A backup value (to restore the state quickly
+    * Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone)
+  * Branch misprediction resembles Exception
+    * The difference is that branch misprediction is not visible to the software
+      * Also much more common (say, divide by zero vs. a mispredicted branch)
+    * Recovery is similar to exception handling
+  * Latency of the state recovery
+  * What to do during the state recovery
+  * Checkpointing
+    * Advantages?

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools