This is an old revision of the document!

Buzzwords

Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.

Lecture 1 (1/12 Mon.)

Level of transformation
- Algorithm
- System software
- Compiler
Cross abstraction layers
Tradeoffs
Caches
DRAM/memory controller
DRAM banks
Row buffer hit/miss
Row buffer locality
Unfairness
Memory performance hog
Shared DRAM memory system
Streaming access vs. random access
Memory scheduling policies
Scheduling priority
Retention time of DRAM
Process variation
Retention time profile
Power consumption
Bloom filter
Hamming code
Hamming distance
DRAM row hammer

Lecture 2 (1/14 Wed.)

Moore's Law
Algorithm –> step-by-step procedure to solve a problem
in-order execution
out-of-order execution
technologies that are available on cellphones
new applications that are made available through new computer architecture techniques
- more data mining (genomics/medical areas)
lower power (cellphones)
smaller cores (cellphones/computers)
etc.
Performance bottlenecks in a single thread/core processors
- multi-core as an alternative
Memory wall (a part of scaling issue)
Scaling issue
- Transister are getting smaller
Key components of a computer
Design points
- Design processors to meet the design points
Software stack
Design decisions
Datacenters
Reliability problems that cause errors
Analogies from Kuhn's “The Structure of Scientific Revolutions” (Recommended book)
- Pre paradigm science
- Normal science
- Revolutionalry science
Components of a computer
- Computation
  - Communication
  - Storage
    - DRAM
    - NVRAM (Non-volatile memory): PCM, STT-MRAM
    - Storage (Flash/Harddrive)
Von Neumann Model (Control flow model)
- Stored program computer
  - Properties of Von Neumann Model: Stored program, sequential instruction processing
  - Unified memory
    - When does an instruction is being interpreted as an instruction (as oppose to a datum)?
  - Program counter
  - Examples: x86, ARM, Alpha, IBM Power series, SPARC, MIPS
Data flow model
- Data flow machine
  - Data flow graph
- Operands
- Live-outs/Live-ins
  - DIfferent types of data flow nodes (conditional/relational/barrier)
- How to do transactional transaction in dataflow?
  - Example: bank transactions
Tradeoffs between control-driven and data-driven
- What are easier to program?
  - Which are easy to compile?
  - What are more parallel (does that mean it is faster?)
  - Which machines are more complex to design?
- In control flow, when a program is stop, there is a pointer to the current state (precise state).
ISA vs. Microarchitecture
- Semantics in the ISA
  - uArch should obey the ISA
  - Changing ISA is costly, can affect compatibility.
Instruction pointers
uArch techniques: common and powerful techniques break Vonn Neumann model if done at the ISA level
- Conceptual techniques
  - Pipelining
  - Multiple instructions at a time
  - Out-of-order executions
  - etc.
    - Design techniques
      - Adder implementation (Bit serial, ripple carry, carry lookahead)
      - Connection machine (an example of a machine that use bit serial to tradeoff latency for more parallelism)
Microprocessor: ISA + uArch + circuits
What are a part of the ISA? Instructions, memory, etc.
- Things that are visible to the programmer/software
What are not a part of the ISA? (what goes inside: uArch techniques)
- Things that are not suppose to be visible to the programmer/software but typically make the processor faster and/or consumes less power and/or less complex

Lecture 3 (1/17 Fri.)

Microarchitecture
Three major tradeoffs of computer architecture
Macro-architecture
LC-3b ISA
Unused instructions
Bit steering
Instruction processing style
0,1,2,3 address machines
Stack machine
Accumulator machine
2-operand machine
3-operand machine
Tradeoffs between 0,1,2,3 address machines
Postfix notation
Instructions/Opcode/Operade specifiers (i.e. addressing modes)
Simply vs. complex data type (and their tradeoffs)
Semantic gap and level
Translation layer
Addressability
Byte/bit addressable machines
Virtual memory
Big/little endian
Benefits of having registers (data locality)
Programmer visible (Architectural) state
Programmers can access this directly
What are the benefits?
Microarchitectural state
Programmers cannot access this directly
Evolution of registers (from accumulators to registers)
Different types of instructions
Control instructions
Data instructions
Operation instructions
Addressing modes
Tradeoffs (complexity, flexibility, etc.)
Orthogonal ISA
Addressing modes that are orthogonal to instruction types
I/O devices
Vectored vs. non-vectored interrupts
Complex vs. simple instructions
Tradeoffs
RISC vs. CISC
Tradeoff
Backward compatibility
Performance
Optimization opportunity
Translation

Lecture 4 (1/21 Wed.)

Fixed vs. variable length instruction
Huffman encoding
Uniform vs. non-uniform decode
Registers
- Tradeoffs between number of registers
Alignments
- How does MIPS load words across alignment the boundary

Lecture 5 (1/26 Mon.)

Tradeoffs in ISA: Instruction length
- Uniform vs. non-uniform
Design point/Use cases
- What dictates the design point?
Architectural states
uArch
- How to implement the ISA in the uArch
Different stages in the uArch
Clock cycles
Multi-cycle machine
Datapath and control logic
- Control signals
Execution time of instructions/program
- Metrics and what do they means
Instruction processing
- Fetch
- Decode
- Execute
- Memory fetch
- Writeback
Encoding and semantics
Different types of instructions (I-type, R-type, etc.)
Control flow instructions
Non-control flow instructions
Delayed slot/Delayed branch
Single cycle control logic
Lockstep
Critical path analysis
- Critical path of a single cycle processor
What is in the control signals?
- Combinational logic & Sequential logic
Control store
Tradeoffs of a single cycle uarch
Design principles
- Common case design
- Critical path design
- Balanced designs
- Dynamic power/Static power
  - Increases in power due to frequency

Lecture 6 (1/28 Mon.)

Design principles
- Common case design
- Critical path design
- Balanced designs
Multi cycle design
Microcoded/Microprogrammed machines
- States
- Translation from one state to another
- Microinstructions
- Microsequencing
- Control store - Product control signals
- Microsequencer
- Control signal
  - What do they have to control?
Instruction processing cycle
Latch signals
State machine
State variables
Condition code
Steering bits
Branch enable logic
Difference between gating and loading? (write enable vs. driving the bus)
Memory mapped I/O
Hardwired logic
- What control signals come from hardwired logic?
Variable latency memory
Handling interrupts
Difference betwen interrupts and exceptions
Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
Updating machine behavior
Horizontal microcode
Vertical microcode
Primitives

Lecture 7 (1/30 Fri.)

Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
Updating machine behavior
Horizontal microcode
Vertical microcode
Primitives
nanocode and millicode
- what are the differences between nano/milli/microcode
microprogrammed vs. hardwire control
Pipelining
Limitations of the multi-programmed design
- Idle resources
Throughput of a pipelined design
- What dictacts the throughput of a pipelined design?
Latency of the pipelined design
Dependency
Overhead of pipelining
- Latch cost?
Data forwarding/bypassing
What are the ideal pipeline?
External fragmentation
Issues in pipeline designs
- Stalling
  - Dependency (Hazard)
    - Flow dependence
    - Output dependence
    - Anti dependence
    - How to handle them?
- Resource contention
- Keeping the pipeline full
- Handling exception/interrupts
- Pipeline flush
- Speculation

Lecture 8 (2/2 Mon.)

Interlocking
Multipath execution
Fine grain multithreading
No-op (Bubbles in the pipeline)
Valid bits in the instructions
Branch prediction
Different types of data dependence
Pipeline stalls
- bubbles
- How to handle stalls
- Stall conditions
- Stall signals
- Dependences
  - Distant between dependences
- Data forwarding/bypassing
- Maintaining the correct dataflow
Different ways to design data forwarding path/logic
Different techniques to handle interlockings
- SW based
- HW based
Profiling
- Static profiling
- Helps from the software (compiler)
  - Superblock optimization
  - Analyzing basic blocks
How to deal with branches?
- Branch prediction
- Delayed branching (branch delay slot)
- Forward control flow/backward control flow
- Branch prediction accuracy
Profile guided code positioning
- Based on the profile info. position the code based on it
- Try to make the next sequential instruction be the next inst. to be executed
Predicate combining (combine predicate for a branch instruction)
Predicated execution (control dependence becomes data dependence)

Lecture 9 (2/4 Wed.)

Definition of basic blocks
Control flow graph
Delayed branching
- benefit?
- What does it eliminates?
- downside?
- Delayed branching in SPARC (with squashing)
- Backward compatibility with the delayed slot
- What should be filled in the delayed slot
- How to ensure correctness
Fine-grained multithreading
- fetch from different threads
- What are the issues (what if the program doesn't have many threads)
- CDC 6000
- Denelcor HEP
- No dependency checking
- Inst. from different thread can fill-in the bubbles
- Cost?
Simulteneuos multithreading
Branch prediction
- Guess what to fetch next.
- Misprediction penalty
- Need to guess the direction and target
- How to perform the performance analysis?
  - Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction.
  - Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
    - How many extra instructions are being fetched?
    - What is the performance degredation?
- How to reduce the miss penalty?
- Predicting the next address (non PC+4 address)
- Branch target buffer (BTB)
  - Predicting the address of the branch
- Global branch history - for directions
- Can use compiler to profile and get more info
  - Input set dictacts the accuracy
  - Add time to compilation
- Heuristics that are common and doesn't require profiling.
  - Might be inaccurate
  - Does not require profiling
- Static branch prediction
  - Pregrammer provides pragmas, hinting the likelihood of taken/not taken branch
  - For example, x86 has the hint bit
- Dynamic branch prediction
  - Last time predictor
  - Two bits counter based prediction
    - One more bit for hysteresis

Lecture 10 (2/6 Fri.)

Branch prediction accuracy
- Why are they very important?
  - Differences between 99% accuracy and 98% accuracy
  - Cost of a misprediction when the pipeline is veryd eep
Global branch correlation
- Some branches are correlated
Local branch correlation
- Some branches can depend on the result of past branches
Pattern history table
- Record global taken/not taken results.
- Cost vs. accuracy (What to record, do you record PC? Just taken/not taken info.?)
One-level branch predictor
- What information are used
Two-level branch prediction
- What entries do you keep in the global history?
- What entries do you keep in the local history?
- How many table?
- Cost when training a table
- What are the purposes of each table?
- Potential problems of a two-level history
GShare predictor
- Global history predictor is hashed with the PC
- Store both GHP and PC in one combined information
- How do you use the information? Why does the XOR result still usable?
Warmup cost of the branch predictor
- Hybrid solution? Fast warmup is used first, then switch to the slower one.
Tournament predictor (Alpha 21264)
Predicated execution - eliminate branches
- What are the tradeoffs
- What if the block is big (can lead to execution a lot of useless work)
- Allows easier code optimization
  - From the compiler PoV, predicated execution combine multiple basic blocks into one bigger basic block
  - Reduce control dependences
- Need ISA support
Wish branches
- Compiler generate both predicated and non-predicated codes
- HW design which one to use
  - Use branch prediction on an easy to predict code
  - Use predicated execution on a hard to predict code
  - Compiler can be more aggressive in optimimzing the code
- What are the tradeoffs (slide# 47)
Multi-path execution
- Execute both paths
- Can lead to wasted work
- VLIW
- SuperScalar

Lecture 11 (2/11 Wed.)

Geometric GHR length for branch prediction
Perceptron branch predictor
Multi-cycle executions (Different functional units take different number of cycles)
- Instructions can retire out-of-order
  - How to deal with this case? Stall? Throw exceptions if there are problems?
Exceptions and Interrupts
- When they are handled?
- Why are some interrupts should be handled right away?
Precise exception
- arch. state should be consistent before handling the exception/interrupts
  - Easier to debug (you see the sequential flow when the interrupt occurs)
    - Deterministic
  - Easier to recover from the exception
  - Easier to restart the processes
- How to ensure precise exception?
- Tradeoffs between each method
Reorder buffer
- Reorder results before they are visible to the arch. state
  - Need to presearve the sequential sematic and data
- What are the informatinos in the ROB entry
- Where to get the value from (forwarding path? reorder buffer?)
  - Extra logic to check where the youngest instructions/value is
  - Content addressible search (CAM)
    - A lot of comparators
- Different ways to simplify the reorder buffer
- Register renaming
  - Same register refers to independent values (lacks of registers)
- Where does the exception happen (after retire)
History buffer
- Update the register file when the instruction complete. Unroll if there is an exception.
Future file (commonly used, along with reorder buffer)
- Keep two set of register files
  - An updated value (Speculative), called future file
  - A backup value (to restore the state quickly
- Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone)
Branch misprediction resembles Exception
- The difference is that branch misprediction is not visible to the software
  - Also much more common (say, divide by zero vs. a mispredicted branch)
- Recovery is similar to exception handling
Latency of the state recovery
What to do during the state recovery
Checkpointing
- Advantages?

Lecture 12 (2/13 Fri.)

Renaming
Register renaming table
Predictor (branch predictor, cache line predictor …)
Power budget (and its importance)
Architectural state, precise state
Memory dependence is known dynamically
Register state is not shared across threads/processors
Memory state is shared across threads/processors
How to maintain speculative memory states
Write buffers (helps simplify the process of checking the reorder buffer)
Overall OoO mechanism
- What are other ways of eliminating dispatch stalls
- Dispatch when the sources are ready
- Retired instructions make the source available
- Register renaming
- Reservation station
  - What goes into the reservation station
  - Tags required in the reservation station
- Tomasulo's algorithm
- Without precise exception, OoO is hard to debug
- Arch. register ID
- Examples in the slides
  - Slides 28 –> register renaming
  - Slides 30-35 –> Exercise (also on the board)
    - This will be usefull for the midterm
- Register aliasing table
- Broadcasting tags
- Using dataflow

18-447 Introduction to Computer Architecture – Spring 2015

Sidebar

Table of Contents

Buzzwords

Lecture 1 (1/12 Mon.)

Lecture 2 (1/14 Wed.)

Lecture 3 (1/17 Fri.)

Lecture 4 (1/21 Wed.)

Lecture 5 (1/26 Mon.)

Lecture 6 (1/28 Mon.)

Lecture 7 (1/30 Fri.)

Lecture 8 (2/2 Mon.)

Lecture 9 (2/4 Wed.)

Lecture 10 (2/6 Fri.)

Lecture 11 (2/11 Wed.)

Lecture 12 (2/13 Fri.)

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Sidebar

Table of Contents

Buzzwords

Lecture 1 (1/12 Mon.)

Lecture 2 (1/14 Wed.)

Lecture 3 (1/17 Fri.)

Lecture 4 (1/21 Wed.)

Lecture 5 (1/26 Mon.)

Lecture 6 (1/28 Mon.)

Lecture 7 (1/30 Fri.)

Lecture 8 (2/2 Mon.)

Lecture 9 (2/4 Wed.)

Lecture 10 (2/6 Fri.)

Lecture 11 (2/11 Wed.)

Lecture 12 (2/13 Fri.)

Page Tools