This is an old revision of the document!

Buzzwords

Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.

Lecture 1 (1/13 Mon.)

Level of transformation
- Algorithm
- System software
- Compiler
Cross abstraction layers
- Expose an interface
Tradeoffs
Caches
Multi-thread
Multi-core
Unfairness
DRAM controller/Memory controller
Memory hog
Row buffer hit/miss
Row buffer locality
Streaming access/ Random access
DRAM refresh
Retention time
Profiling DRAM retention time
Power consumption
Wimpy cores
Bloom filter
- Pros/Cons
- False Positive
Simulation
Memory performance attacks
RTL design

Lecture 2 (1/15 Wed.)

Optimizing for energy/ Optimizing for the performance
- Generally you should optimize for the users
state-of-the-art
RTL Simulation
- Long, slow and can be costly
High level simulation
- What should be employed?
Important to get the idea of how they are implemented in RTL
Allows designer to filter out techniques that do not work well
Design points
- Design processors to meet the design points
Software stack
Design decisions
Datacenters
MIPS R2000
- What are architectural techniques that improve the performance of a processor over MIPS 2000
Moore's Law
in-order execution
out-of-order execution
technologies that are available on cellphones
new applications that are made available through new computer architecture techniques
- more data mining (genomics/medical areas)
lower power (cellphones)
smaller cores (cellphones/computers)
etc.
Performance bottlenecks in a single thread/core processors
- multi-core as an alternative
Memory wall (a part of scaling issue)
Scaling issue
- Transister are getting smaller
Reliability problems that cause errors
Analogies from Kuhn's “The Structure of Scientific Revolutions” (Recommended book)
- Pre paradigm science
- Normal science
- Revolutionalry science
Components of a computer
- Computation
  - Communication
    - Storage
      - DRAM
      - NVRAM (Non-volatile memory): PCM, STT-MRAM
      - Storage (Flash/Harddrive)
Von Neumann Model (Control flow model)
- Stored program computer
  - Properties of Von Neumann Model: Stored program, sequential instruction processing
  - Unified memory
    - When does an instruction is being interpreted as an instruction (as oppose to a datum)?
  - Program counter
  - Examples: x86, ARM, Alpha, IBM Power series, SPARC, MIPS
Data flow model
- Data flow machine
  - Data flow graph
- Operands
- Live-outs/Live-ins
  - DIfferent types of data flow nodes (conditional/relational/barrier)
- How to do transactional transaction in dataflow?
  - Example: bank transactions
Tradeoffs between control-driven and data-driven
- What are easier to program?
- Which are easy to compile?
- What are more parallel (does that mean it is faster?)
- Which machines are more complex to design?
- In control flow, when a program is stop, there is a pointer to the current state (precise state).
ISA vs. Microarchitecture
- Semantics in the ISA
- uArch should obey the ISA
- Changing ISA is costly, can affect compatibility.
Instruction pointers
uArch techniques: common and powerful techniques break Vonn Neumann model if done at the ISA level
- Conceptual techniques
  - Pipelining
  - Multiple instructions at a time
  - Out-of-order executions
  - etc.
- Design techniques
  - Adder implementation (Bit serial, ripple carry, carry lookahead)
  - Connection machine (an example of a machine that use bit serial to tradeoff latency for more parallelism)
Microprocessor: ISA + uArch + circuits
What are a part of the ISA? Instructions, memory, etc.
- Things that are visible to the programmer/software
What are not a part of the ISA? (what goes inside: uArch techniques)
- Things that are not suppose to be visible to the programmer/software but typically make the processor faster and/or consumes less power and/or less complex

Lecture 3 (1/17 Fri.)

Design tradeoffs
Macro Architectures
Reconfiguribility vs. specialized designs
Parallelism (instructions, data parallel)
Uniform decode (Example: Alpha)
Steering bits (Sub-opcode)
0,1,2,3 address machines
- Stack machine
- Accumulator machine
- 2-operand machine
- 3-operand machine
- Tradeoffs between 0,1,2,3 address machines
Instructions/Opcode/Operade specifiers (i.e. addressing modes)
Simply vs. complex data type (and their tradeoffs)
Semantic gap
Translation layer
Addressability
Byte/bit addressable machines
Virtual memory
Big/little endian
Benefits of having registers (data locality)
Programmer visible (Architectural) state
- Programmers can access this directly
- What are the benefits?
Microarchitectural state
- Programmers cannot access this directly
Evolution of registers (from accumulators to registers)
Different types of instructions
- Control instructions
- Data instructions
- Operation instructions
Addressing modes
- Tradeoffs (complexity, flexibility, etc.)
Orthogonal ISA
- Addressing modes that are orthogonal to instructino types
Vectors vs. non vectored interrupts
Complex vs. simple instructions
- Tradeoffs
RISC vs. CISC
- Tradeoff
- Backward compatibility
- Performance
- Optimization opportunity

Lecture 4 (1/22 Wed.)

Semantic gap
- Small vs. Large semantic gap (CISC vs. RISC)
- Benefit of RISC vs. CISC
Micro operations/microcode
- Translate complex instructions into smaller instructions
Parallelism (motivation for RISC)
Compiler optimization
Code optimization through translation
VLIW
Fixed vs. variable length instructions
- Tradeoffs
  - Alignment issues? (fetch/decode)
  - Decoding issues?
  - Code size?
  - Adding additional instructions?
  - Memory bandwidth and cache utilization?
  - Energy?
- Encoding in variable length instructions
Structure of Alpha instructions and other uniform decode instructions
- Different type of instructions
- Benefit of knowing what type of instructions
  - Speculatively operate future instructions
x86 and other non-uniform decode instructions
- Tradeoff vs. uniform decode
Tradeoffs for different number of registers
- Spilling into memory if the number of registers is small
- Compiler optimization on how to manage which value to keep/spill
Addressing modes
- Benefits?
- Types?
- Different uses of addressing modes?
Various ISA-level tradeoffs
Virtual memory
Unalign memory access/aligned memory access
- Cost vs. benefit of unaligned access
ISA specification
- Things you have to obey/specifie in the ISA specification
Architectural states
Microarchitecture implements how arch. state A transformed to the next arch. state A'
Single cycle machines
- Critical path in the single cycle machine
Multi cycle machines
Functional units
Performance metrics
- CPI/IPC
  - CPI of a single cycle microarchitecture

Lecture 5 (1/24 Fri.)

Instruction processing
- Fetch
- Decode
- Execute
- Memory fetch
- Writeback
Datapath & Control logic in microprocessors
Different types of instructions (I-type, R-type, etc.)
Control flow instructions
Non-control flow instructions
Delayed slot/Delayed branch
Single cycle control logic
Lockstep
Critical path analysis
- Critical path of a single cycle processor
Combinational logic & Sequential logic
Control store
Tradeoffs of a single cycle uarch
Dynamic power/Static power
Speedup calculation
- Parallelism
- Serial bottleneck
- Amdahl's bottleneck
Design principles
- Common case design
- Critical path design
- Balanced designs
Multi cycle design

Lecture 6 (1/27 Mon.)

Microcoded/Microprogrammed machines
- States
- Microinstructions
- Microsequencing
- Control store - Product control signals
- Microsequencer
- Control signal
  - What do they have to control?
Instruction processing cycle
Latch signals
State machine
State variables
Condition code
Steering bits
Branch enable logic
Difference between gating and loading? (write enable vs. driving the bus)
Memory mapped I/O
Hardwired logic
- What control signals come from hardwired logic?
Variable latency memory
Handling interrupts
Difference betwen interrupts and exceptions
Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
Updating machine behavior
Horizontal microcode
Vertical microcode
Primitives

Lecture 7 (1/29 Wed.)

Pipelining
Limitations of the multi-programmed design
- Idle resources
Throughput of a pipelined design
- What dictacts the throughput of a pipelined design?
Latency of the pipelined design
Dependency
Overhead of pipelining
- Latch cost?
Data forwarding/bypassing
What are the ideal pipeline?
External fragmentation
Issues in pipeline designs
- Stalling
  - Dependency (Hazard)
    - Flow dependence
    - Output dependence
    - Anti dependence
    - How to handle them?
- Resource contention
- Keeping the pipeline full
- Handling exception/interrupts
- Pipeline flush
- Speculation
Interlocking
Multipath execution
Fine grain multithreading
No-op (Bubbles in the pipeline)
Valid bits in the instructions

Lecture 8 (1/31 Fri.)

Branch prediction
Different types of data dependence
Pipeline stalls
- bubbles
- How to handle stalls
- Stall conditions
- Stall signals
- Dependences
  - Distant between dependences
- Data forwarding/bypassing
- Maintaining the correct dataflow
Different ways to design data forwarding path/logic
Different techniques to handle interlockings
- SW based
- HW based
Profiling
- Static profiling
- Helps from the software (compiler)
  - Superblock optimization
  - Analyzing basic blocks
How to deal with branches?
- Branch prediction
- Delayed branching (branch delay slot)
- Forward control flow/backward control flow
- Branch prediction accuracy
Profile guided code positioning
- Based on the profile info. position the code based on it
- Try to make the next sequential instruction be the next inst. to be executed
Trace cache
Predicate combining (combine predicate for a branch instruction)
Predicated execution (control dependence becomes data dependence)
Definition of basic blocks
Control flow graph

Lecture 9 (2/3 Mon.)

Delayed branching
- benefit?
- What does it eliminates?
- downside?
- Delayed branching in SPARC (with squashing)
- Backward compatibility with the delayed slot
- What should be filled in the delayed slot
- How to ensure correctness
Fine-grained multithreading
- fetch from different threads
- What are the issues (what if the program doesn't have many threads)
- CDC 6000
- Denelcor HEP
- No dependency checking
- Inst. from different thread can fill-in the bubbles
- Cost?
Simulteneuos multithreading
Branch prediction
- Guess what to fetch next.
- Misprediction penalty
- Need to guess the direction and target
- How to perform the performance analysis?
  - Given the branch prediction accuracy and penalty cost, how to compute a cost of a branch misprediction.
  - Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
    - How many extra instructions are being fetched?
    - What is the performance degredation?
- How to reduce the miss penalty?
- Predicting the next address (non PC+4 address)
- Branch target buffer (BTB)
  - Predicting the address of the branch
- Global branch history - for directions
- Can use compiler to profile and get more info
  - Input set dictacts the accuracy
  - Add time to compilation
- Heuristics that are common and doesn't require profiling.
  - Might be inaccurate
  - Does not require profiling
- Programmer can tell the hardware (via pragmas (hints))
  - For example, x86 has the hint bit
- Dynamic branch prediction
  - Last time predictor
  - Two bits counter based prediction
    - One more bit for hysteresis

18-447 Introduction to Computer Architecture – Spring 2015

Sidebar

Table of Contents

Buzzwords

Lecture 1 (1/13 Mon.)

Lecture 2 (1/15 Wed.)

Lecture 3 (1/17 Fri.)

Lecture 4 (1/22 Wed.)

Lecture 5 (1/24 Fri.)

Lecture 6 (1/27 Mon.)

Lecture 7 (1/29 Wed.)

Lecture 8 (1/31 Fri.)

Lecture 9 (2/3 Mon.)

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Sidebar

Table of Contents

Buzzwords

Lecture 1 (1/13 Mon.)

Lecture 2 (1/15 Wed.)

Lecture 3 (1/17 Fri.)

Lecture 4 (1/22 Wed.)

Lecture 5 (1/24 Fri.)

Lecture 6 (1/27 Mon.)

Lecture 7 (1/29 Wed.)

Lecture 8 (1/31 Fri.)

Lecture 9 (2/3 Mon.)

Page Tools