This is an old revision of the document!
Table of Contents
Buzzwords
Lecture 1
- Architecture of Parallel Computers
- Fundamentals and Tradeoffs
- Static and Dynamic Scheduling
- Parallel Task Assignment
- Static/Dynamic
- Task Queues
- Task Stealing
Lecture 2
- Parallel Computer
- SISD, SIMD, MISD, MIMD
- Performance
- Power consumption
- Cost efficiency
- Scalability
- Complexity
- Dependability
- Instruction Level Parallelism
- Data Parallelism
- Task Level Parallelism
- Parallel programming
- Thread level speculation
- Loosely/Tightly coupled multiprocessors
- Shared memory synchronization
- Cache consistency
- Ordering of memory operations
- Hardware-based Multithreading
- Coarse grained
- Fine grained
- Simultaneous
- Amdahl’s Law
- Serial bottleneck
- Synchronization overhead
- Load imbalance overhead
- Resource sharing overhead
- Superlinear Speedup
- Unfair comparisons
- Memory/cache effect
- Utilization, Redundancy, Efficiency
- Parallel Programming
- Parallel and Serial Bottlenecks
Lecture 3
- Programming Models vs. Architectures
- Shared memory programming model
- Message passing programming model
- Shared memory hardware
- Message passing hardware
- Communication abstraction
- Generic Parallel Machine
- Data Flow Graph
- Synchronization
- Application Binary Interface (ABI)
- Data parallel programming model
- Data parallel hardware
- Connection Machine
- Data flow programming model
- Data flow hardware
- Scalability
- Interconnection Schemes
- Uniform Memory/Cache Access (UMA/UCA)
- Memory latency
- Memory bandwidth
- Symmetric multiprocessing (SMP)
- Data placement
- Non-Uniform Memory/Cache Access (NUMA/NUCA)
- Local and remote memories
- Critical path of memory access
Lecture 4
- Multi-Core Processors
- Technology scaling
- Transistors and die area
- Large Superscalar
- Single-thread performance
- Instruction issue queue
- Multi-ported register file
- Loop-level parallelism
- Multiprogramming
- Bigger caches
- Multithreading
- Thread-level parallelism
- Resource sharing
- Integrating platform components
- Clustered superscalar processor
- Inter-cluster bypass
- Traditional symmetric multiprocessors
Lecture 5
- Chip Multiprocessor (CMP)
- Workload Characteristics
- Instruction Level Parallelism (ILP)
- Piranha CMP
- Processing Node
- Coherence Protocol Engine
- I/O Node
- Sun Niagara (UltraSPARC T1)
- Niagara Core
- Sun Niagara II (UltraSPARC T2)
- Chip Multithreading (CMT)
- Sun Rock
- Runahead Execution
- Memory Level Parallelism (MLP)
- IBM POWER4
- IBM POWER5
- IBM POWER6
- IBM POWER7
- Large vs. Small Cores
- Tile-Large vs. Tile-Small
- Asymmetric Chip Multiprocessor (ACMP)
- Serial Bottlenecks
- Amdahl's Law
- Asymmetric vs. Symmetric Cores
- Frequency Boosting
- EPI Throttling
- Dynamic voltage frequency scaling (DVFS)
Lecture 6
- EPI Throttling
- Asymmetric Chip Multiprocessor (ACMP)
- Energy Efficiency
- Programmer effort
- Shared Resource Management
- Serialized Code Sections
- Accelerated Critical Sections (ACS)
- Bottleneck Identification and Scheduling (BIS)
Lecture 7
- Main Memory
- Memory Capacity
- Memory Latency
- Memory Bandwidth
- Memory Energy/Power
- Technology Scaling
- DRAM Scaling
- Charge Memory
- Resistive Memory
- Non-volatile Memory
- Phase Change Memory (PCM)
- Hybrid Memory
- Write Filtering
- Row-Locality Aware Data Placement
- Tags in Memory
- Dynamic Data Transfer Granularity
- Memory Security
Lecture 8
- Barriers
- Thread Waiting
- Bottleneck Acceleration
- False Serialization
- Starvation
- Preemptive Acceleration
- Staged Execution Model
- Segment Spawning
- Inter-segment data
- Generator instruction
- Data Marshaling
- Pipeline Parallelism
- Coverage, Accuracy, Timeliness
Lecture 9
- Memory Scheduling
- Fairness-Throughput
- Thread cluster
- Memory intensity
- CPU-GPU Systems
- Heterogeneous Memory Systems
- Thread
- Multitasking
- Thread context
- Hardware Multithreading
- Latency tolerance
- Fine-grained Multithreading
- Pipeline utilization
- Coarse-grained Multithreading
- Stall events
- Thread Switching Urgency
- Fairness