This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2014/02/05 19:20] rachata |
buzzword [2015/01/10 21:03] kevincha |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Buzzwords ====== | ====== Buzzwords ====== | ||
- | Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material. | + | Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material. |
===== Lecture 1 (1/13 Mon.) ===== | ===== Lecture 1 (1/13 Mon.) ===== | ||
- | * Level of transformation | ||
- | * Algorithm | ||
- | * System software | ||
- | * Compiler | ||
- | * Cross abstraction layers | ||
- | * Expose an interface | ||
- | * Tradeoffs | ||
- | * Caches | ||
- | * Multi-thread | ||
- | * Multi-core | ||
- | * Unfairness | ||
- | * DRAM controller/Memory controller | ||
- | * Memory hog | ||
- | * Row buffer hit/miss | ||
- | * Row buffer locality | ||
- | * Streaming access/ Random access | ||
- | * DRAM refresh | ||
- | * Retention time | ||
- | * Profiling DRAM retention time | ||
- | * Power consumption | ||
- | * Wimpy cores | ||
- | * Bloom filter | ||
- | * Pros/Cons | ||
- | * False Positive | ||
- | * Simulation | ||
- | * Memory performance attacks | ||
- | * RTL design | ||
- | |||
- | ===== Lecture 2 (1/15 Wed.) ===== | ||
- | |||
- | * Optimizing for energy/ Optimizing for the performance | ||
- | * Generally you should optimize for the users | ||
- | * state-of-the-art | ||
- | * RTL Simulation | ||
- | * Long, slow and can be costly | ||
- | * High level simulation | ||
- | * What should be employed? | ||
- | * Important to get the idea of how they are implemented in RTL | ||
- | * Allows designer to filter out techniques that do not work well | ||
- | * Design points | ||
- | * Design processors to meet the design points | ||
- | * Software stack | ||
- | * Design decisions | ||
- | * Datacenters | ||
- | * MIPS R2000 | ||
- | * What are architectural techniques that improve the performance of a processor over MIPS 2000 | ||
- | * Moore's Law | ||
- | * in-order execution | ||
- | * out-of-order execution | ||
- | * technologies that are available on cellphones | ||
- | * new applications that are made available through new computer architecture techniques | ||
- | * more data mining (genomics/medical areas) | ||
- | * lower power (cellphones) | ||
- | * smaller cores (cellphones/computers) | ||
- | * etc. | ||
- | * Performance bottlenecks in a single thread/core processors | ||
- | * multi-core as an alternative | ||
- | * Memory wall (a part of scaling issue) | ||
- | * Scaling issue | ||
- | * Transister are getting smaller | ||
- | * Reliability problems that cause errors | ||
- | * Analogies from Kuhn's "The Structure of Scientific Revolutions" (Recommended book) | ||
- | * Pre paradigm science | ||
- | * Normal science | ||
- | * Revolutionalry science | ||
- | * Components of a computer | ||
- | * Computation | ||
- | * Communication | ||
- | * Storage | ||
- | * DRAM | ||
- | * NVRAM (Non-volatile memory): PCM, STT-MRAM | ||
- | * Storage (Flash/Harddrive) | ||
- | * Von Neumann Model (Control flow model) | ||
- | * Stored program computer | ||
- | * Properties of Von Neumann Model: Stored program, sequential instruction processing | ||
- | * Unified memory | ||
- | * When does an instruction is being interpreted as an instruction (as oppose to a datum)? | ||
- | * Program counter | ||
- | * Examples: x86, ARM, Alpha, IBM Power series, SPARC, MIPS | ||
- | * Data flow model | ||
- | * Data flow machine | ||
- | * Data flow graph | ||
- | * Operands | ||
- | * Live-outs/Live-ins | ||
- | * DIfferent types of data flow nodes (conditional/relational/barrier) | ||
- | * How to do transactional transaction in dataflow? | ||
- | * Example: bank transactions | ||
- | * Tradeoffs between control-driven and data-driven | ||
- | * What are easier to program? | ||
- | * Which are easy to compile? | ||
- | * What are more parallel (does that mean it is faster?) | ||
- | * Which machines are more complex to design? | ||
- | * In control flow, when a program is stop, there is a pointer to the current state (precise state). | ||
- | * ISA vs. Microarchitecture | ||
- | * Semantics in the ISA | ||
- | * uArch should obey the ISA | ||
- | * Changing ISA is costly, can affect compatibility. | ||
- | * Instruction pointers | ||
- | * uArch techniques: common and powerful techniques break Vonn Neumann model if done at the ISA level | ||
- | * Conceptual techniques | ||
- | * Pipelining | ||
- | * Multiple instructions at a time | ||
- | * Out-of-order executions | ||
- | * etc. | ||
- | * Design techniques | ||
- | * Adder implementation (Bit serial, ripple carry, carry lookahead) | ||
- | * Connection machine (an example of a machine that use bit serial to tradeoff latency for more parallelism) | ||
- | * Microprocessor: ISA + uArch + circuits | ||
- | * What are a part of the ISA? Instructions, memory, etc. | ||
- | * Things that are visible to the programmer/software | ||
- | * What are not a part of the ISA? (what goes inside: uArch techniques) | ||
- | * Things that are not suppose to be visible to the programmer/software but typically make the processor faster and/or consumes less power and/or less complex | ||
- | |||
- | ===== Lecture 3 (1/17 Fri.) ===== | ||
- | |||
- | * Design tradeoffs | ||
- | * Macro Architectures | ||
- | * Reconfiguribility vs. specialized designs | ||
- | * Parallelism (instructions, data parallel) | ||
- | * Uniform decode (Example: Alpha) | ||
- | * Steering bits (Sub-opcode) | ||
- | * 0,1,2,3 address machines | ||
- | * Stack machine | ||
- | * Accumulator machine | ||
- | * 2-operand machine | ||
- | * 3-operand machine | ||
- | * Tradeoffs between 0,1,2,3 address machines | ||
- | * Instructions/Opcode/Operade specifiers (i.e. addressing modes) | ||
- | * Simply vs. complex data type (and their tradeoffs) | ||
- | * Semantic gap | ||
- | * Translation layer | ||
- | * Addressability | ||
- | * Byte/bit addressable machines | ||
- | * Virtual memory | ||
- | * Big/little endian | ||
- | * Benefits of having registers (data locality) | ||
- | * Programmer visible (Architectural) state | ||
- | * Programmers can access this directly | ||
- | * What are the benefits? | ||
- | * Microarchitectural state | ||
- | * Programmers cannot access this directly | ||
- | * Evolution of registers (from accumulators to registers) | ||
- | * Different types of instructions | ||
- | * Control instructions | ||
- | * Data instructions | ||
- | * Operation instructions | ||
- | * Addressing modes | ||
- | * Tradeoffs (complexity, flexibility, etc.) | ||
- | * Orthogonal ISA | ||
- | * Addressing modes that are orthogonal to instructino types | ||
- | * Vectors vs. non vectored interrupts | ||
- | * Complex vs. simple instructions | ||
- | * Tradeoffs | ||
- | * RISC vs. CISC | ||
- | * Tradeoff | ||
- | * Backward compatibility | ||
- | * Performance | ||
- | * Optimization opportunity | ||
- | |||
- | ===== Lecture 4 (1/22 Wed.) ===== | ||
- | |||
- | * Semantic gap | ||
- | * Small vs. Large semantic gap (CISC vs. RISC) | ||
- | * Benefit of RISC vs. CISC | ||
- | * Micro operations/microcode | ||
- | * Translate complex instructions into smaller instructions | ||
- | * Parallelism (motivation for RISC) | ||
- | * Compiler optimization | ||
- | * Code optimization through translation | ||
- | * VLIW | ||
- | * Fixed vs. variable length instructions | ||
- | * Tradeoffs | ||
- | * Alignment issues? (fetch/decode) | ||
- | * Decoding issues? | ||
- | * Code size? | ||
- | * Adding additional instructions? | ||
- | * Memory bandwidth and cache utilization? | ||
- | * Energy? | ||
- | * Encoding in variable length instructions | ||
- | * Structure of Alpha instructions and other uniform decode instructions | ||
- | * Different type of instructions | ||
- | * Benefit of knowing what type of instructions | ||
- | * Speculatively operate future instructions | ||
- | * x86 and other non-uniform decode instructions | ||
- | * Tradeoff vs. uniform decode | ||
- | * Tradeoffs for different number of registers | ||
- | * Spilling into memory if the number of registers is small | ||
- | * Compiler optimization on how to manage which value to keep/spill | ||
- | * Addressing modes | ||
- | * Benefits? | ||
- | * Types? | ||
- | * Different uses of addressing modes? | ||
- | * Various ISA-level tradeoffs | ||
- | * Virtual memory | ||
- | * Unalign memory access/aligned memory access | ||
- | * Cost vs. benefit of unaligned access | ||
- | * ISA specification | ||
- | * Things you have to obey/specifie in the ISA specification | ||
- | * Architectural states | ||
- | * Microarchitecture implements how arch. state A transformed to the next arch. state A' | ||
- | * Single cycle machines | ||
- | * Critical path in the single cycle machine | ||
- | * Multi cycle machines | ||
- | * Functional units | ||
- | * Performance metrics | ||
- | * CPI/IPC | ||
- | * CPI of a single cycle microarchitecture | ||
- | |||
- | ===== Lecture 5 (1/24 Fri.) ===== | ||
- | |||
- | * Instruction processing | ||
- | * Fetch | ||
- | * Decode | ||
- | * Execute | ||
- | * Memory fetch | ||
- | * Writeback | ||
- | * Datapath & Control logic in microprocessors | ||
- | * Different types of instructions (I-type, R-type, etc.) | ||
- | * Control flow instructions | ||
- | * Non-control flow instructions | ||
- | * Delayed slot/Delayed branch | ||
- | * Single cycle control logic | ||
- | * Lockstep | ||
- | * Critical path analysis | ||
- | * Critical path of a single cycle processor | ||
- | * Combinational logic & Sequential logic | ||
- | * Control store | ||
- | * Tradeoffs of a single cycle uarch | ||
- | * Dynamic power/Static power | ||
- | * Speedup calculation | ||
- | * Parallelism | ||
- | * Serial bottleneck | ||
- | * Amdahl's bottleneck | ||
- | * Design principles | ||
- | * Common case design | ||
- | * Critical path design | ||
- | * Balanced designs | ||
- | * Multi cycle design | ||
- | |||
- | ===== Lecture 6 (1/27 Mon.) ===== | ||
- | |||
- | * Microcoded/Microprogrammed machines | ||
- | * States | ||
- | * Microinstructions | ||
- | * Microsequencing | ||
- | * Control store - Product control signals | ||
- | * Microsequencer | ||
- | * Control signal | ||
- | * What do they have to control? | ||
- | * Instruction processing cycle | ||
- | * Latch signals | ||
- | * State machine | ||
- | * State variables | ||
- | * Condition code | ||
- | * Steering bits | ||
- | * Branch enable logic | ||
- | * Difference between gating and loading? (write enable vs. driving the bus) | ||
- | * Memory mapped I/O | ||
- | * Hardwired logic | ||
- | * What control signals come from hardwired logic? | ||
- | * Variable latency memory | ||
- | * Handling interrupts | ||
- | * Difference betwen interrupts and exceptions | ||
- | * Emulator (i.e. uCode allots minimal datapath to emulate the ISA) | ||
- | * Updating machine behavior | ||
- | * Horizontal microcode | ||
- | * Vertical microcode | ||
- | * Primitives | ||
- | |||
- | ===== Lecture 7 (1/29 Wed.) ===== | ||
- | |||
- | * Pipelining | ||
- | * Limitations of the multi-programmed design | ||
- | * Idle resources | ||
- | * Throughput of a pipelined design | ||
- | * What dictacts the throughput of a pipelined design? | ||
- | * Latency of the pipelined design | ||
- | * Dependency | ||
- | * Overhead of pipelining | ||
- | * Latch cost? | ||
- | * Data forwarding/bypassing | ||
- | * What are the ideal pipeline? | ||
- | * External fragmentation | ||
- | * Issues in pipeline designs | ||
- | * Stalling | ||
- | * Dependency (Hazard) | ||
- | * Flow dependence | ||
- | * Output dependence | ||
- | * Anti dependence | ||
- | * How to handle them? | ||
- | * Resource contention | ||
- | * Keeping the pipeline full | ||
- | * Handling exception/interrupts | ||
- | * Pipeline flush | ||
- | * Speculation | ||
- | * Interlocking | ||
- | * Multipath execution | ||
- | * Fine grain multithreading | ||
- | * No-op (Bubbles in the pipeline) | ||
- | * Valid bits in the instructions | ||
- | |||
- | ===== Lecture 8 (1/31 Fri.) ===== | ||
- | * Branch prediction | ||
- | * Different types of data dependence | ||
- | * Pipeline stalls | ||
- | * bubbles | ||
- | * How to handle stalls | ||
- | * Stall conditions | ||
- | * Stall signals | ||
- | * Dependences | ||
- | * Distant between dependences | ||
- | * Data forwarding/bypassing | ||
- | * Maintaining the correct dataflow | ||
- | * Different ways to design data forwarding path/logic | ||
- | * Different techniques to handle interlockings | ||
- | * SW based | ||
- | * HW based | ||
- | * Profiling | ||
- | * Static profiling | ||
- | * Helps from the software (compiler) | ||
- | * Superblock optimization | ||
- | * Analyzing basic blocks | ||
- | * How to deal with branches? | ||
- | * Branch prediction | ||
- | * Delayed branching (branch delay slot) | ||
- | * Forward control flow/backward control flow | ||
- | * Branch prediction accuracy | ||
- | * Profile guided code positioning | ||
- | * Based on the profile info. position the code based on it | ||
- | * Try to make the next sequential instruction be the next inst. to be executed | ||
- | * Trace cache | ||
- | * Predicate combining (combine predicate for a branch instruction) | ||
- | * Predicated execution (control dependence becomes data dependence) | ||
- | * Definition of basic blocks | ||
- | * Control flow graph | ||
- | |||
- | ===== Lecture 9 (2/3 Mon.) ===== | ||
- | * Delayed branching | ||
- | * benefit? | ||
- | * What does it eliminates? | ||
- | * downside? | ||
- | * Delayed branching in SPARC (with squashing) | ||
- | * Backward compatibility with the delayed slot | ||
- | * What should be filled in the delayed slot | ||
- | * How to ensure correctness | ||
- | * Fine-grained multithreading | ||
- | * fetch from different threads | ||
- | * What are the issues (what if the program doesn't have many threads) | ||
- | * CDC 6000 | ||
- | * Denelcor HEP | ||
- | * No dependency checking | ||
- | * Inst. from different thread can fill-in the bubbles | ||
- | * Cost? | ||
- | * Simulteneuos multithreading | ||
- | * Branch prediction | ||
- | * Guess what to fetch next. | ||
- | * Misprediction penalty | ||
- | * Need to guess the direction and target | ||
- | * How to perform the performance analysis? | ||
- | * Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction. | ||
- | * Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions. | ||
- | * How many extra instructions are being fetched? | ||
- | * What is the performance degredation? | ||
- | * How to reduce the miss penalty? | ||
- | * Predicting the next address (non PC+4 address) | ||
- | * Branch target buffer (BTB) | ||
- | * Predicting the address of the branch | ||
- | * Global branch history - for directions | ||
- | * Can use compiler to profile and get more info | ||
- | * Input set dictacts the accuracy | ||
- | * Add time to compilation | ||
- | * Heuristics that are common and doesn't require profiling. | ||
- | * Might be inaccurate | ||
- | * Does not require profiling | ||
- | * Programmer can tell the hardware (via pragmas (hints)) | ||
- | * For example, x86 has the hint bit | ||
- | * Dynamic branch prediction | ||
- | * Last time predictor | ||
- | * Two bits counter based prediction | ||
- | * One more bit for hysteresis | ||
- | |||
- | |||
- | ===== Lecture 10 (2/5 Wed.) ===== | ||
- | |||
- | * Branch prediction accuracy | ||
- | * Why are they very important? | ||
- | * Differences between 99% accuracy and 98% accuracy | ||
- | * Cost of a misprediction when the pipeline is veryd eep | ||
- | * Value prediction | ||
- | * Global branch correlation | ||
- | * Some branches are correlated | ||
- | * Local branch correlation | ||
- | * Some branches can depend on the result of past branches | ||
- | * Pattern history table | ||
- | * Record global taken/not taken results. | ||
- | * Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?) | ||
- | * One-level branch predictor | ||
- | * What information are used | ||
- | * Two-level branch prediction | ||
- | * What entries do you keep in the glocal history? | ||
- | * What entries do you keep in the local history? | ||
- | * How many table? | ||
- | * Cost when training a table | ||
- | * What are the purposes of each table? | ||
- | * Potential problems of a two-level history | ||
- | * GShare predictor | ||
- | * Global history predictor is hashed with the PC | ||
- | * Store both GHP and PC in one combined information | ||
- | * How do you use the information? Why does the XOR result still usable? | ||
- | * Slides (page 16-18) for a good overview of one- and two-level predictors | ||
- | * Warmup cost of the branch predictor | ||
- | * Hybrid solution? Fast warmup is used first, then switch to the slower one. | ||
- | * Tournament predictor (Alpha 21264) | ||
- | * Other types of branch predictor | ||
- | * Using machine learning? | ||
- | * Geometric history length | ||
- | * Look at branches far behind (but using geometric step) | ||
- | * Predicated execution - eliminate branches | ||
- | * What are the tradeoffs | ||
- | * What is the block is big (can lead to execution a lot of useless work) | ||
- | * Allows easier code optimization | ||
- | * From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block | ||
- | * Reduce control dependences | ||
- | * Need ISA support | ||
- | * Wish branches | ||
- | * Compiler generate both predicated and non-predicated codes | ||
- | * HW design which one to use | ||
- | * Use branch prediction on an easy to predict code | ||
- | * Use predicated execution on a hard to predict code | ||
- | * Compiler can be more aggressive in optimimzing the code | ||
- | * What are the tradeoffs (slide# 47) | ||
- | * Multi-path execution | ||
- | * Execute both paths | ||
- | * Can lead to wasted work |