18742: Reading List and Course Plan
(Required reading marked *)
Part I: Parallel Computer Architectures
Course Intro, Architecture Review, Amdahl's Law
*Cramming More Components onto Integrated Circuits (AKA: Moore's Law)
*Parallel Architectures (AKA: Flynn's Taxonomy)
Parallel Architectures
*The Case for a Single-chip Multiprocessor
Parallel Execution Strategies
Dataflow Architecture
*An Evaluation of the TRIPS computer system
Dataflow execution of sequential imperative programs on multicore architectures
Writing and Executing Parallel Programs
Multiprocessor Cache Coherence and Memory Consistency
*How to make a multiprocessor computer that correctly executes multiprocess programs
*Time, clocks and the ordering of events in a distributed system
Cache Coherence and Memory Consistency
*Why On-chip Cache Coherence is here to stay
*Token Coherence: Decoupling Performance and Correctness
Memory consistency and event ordering in scalable shared-memory multiprocessors
Memory Consistency Models (Optional)
Foundations of the C++ concurrency Memory Model
x86-TSO: a rigorous and usable programmer’s model for x86 multiprocessors
Synchronization and Transaction Memory
Optimizing Synchronization and Transactional Memory
*Speculative lock elision: enabling highly concurrent multithreaded execution
Inferential queueing and speculative push for reducing critical communication latencies
Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack
Making the fast case common and the uncommon case simple in unbounded transactional memory
Software Transactional Memory (Optional)
Software Transactional Memory: Why is it only a research toy?
Synthesis Lectures on Transactional Memory (AKA: the TM Book)
Memory Consistency Enforcement Mechanisms
Data-race-free and Speculative Models
*DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
*Transactional Memory Coherence and Consistency
*DRFx: a simple and efficient memory model for concurrent programming languages
BulkSC: bulk enforcement of sequential consistency
SARC Coherence: Scaling Directory Cache Coherence in Performance and Power
Memory Consistency Exceptions
Valor: efficient, software-only region conflict exceptionsArchitecture Support Concurrent Software Reliability
Detecting and Avoiding Concurrency Bugs (Optional)
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
A Case for an interleaving constrained shared-memory multi-processor
AVIO: detecting atomicity violations via access interleaving invariants
Cooperative, Empirical Failure Avoidance for Multithreaded Programs
Finding Concurrency Bugs with Context-aware Communication Graphs
Flexible, Hardware Acceleration for Instruction-Grain Lifeguards
Atom-aid: detecting and surviving atomicity violations
Deterministic Execution
*A "flight data recorder" for enabling full-system multiprocessor deterministic replay
*DMP: deterministic shared memory multiprocessing
Grace: safe multithreaded programming for C/C++
CoreDet: a compiler and runtime system for deterministic multithreaded execution
The End of Moore's Law and the Beginning of the Era of Dark Silicon
Power, Energy, and Dark Silicon
*Amdahl's Law in the Multicore Era (paper-pdf)
*Dark Silicon and the End of Multicore Scaling (paper-pdf)
*Power: A First-class Architectural Design Constraint (skim) (paper-pdf)
Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures (paper-pdf)
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors (paper-pdf)
Part II: Heterogeneity, Specialization, and Acceleration
Fused and Composable Heterogeneous Cores
*Core-fusion: accomodating software diversity in chip multiprocessors (paper-pdf)
*Composable, light-weight processors (paper-pdf)
Specialization
Accelerators for Everything
*Conservation cores: reducing the energy of mature computations (paper-pdf)
*QsCores: Trading Dark Silicon for Scalable Energy with Quasi-specific Cores (paper-pdf)
Database and Genomics Accelerators
*Q100: The Architecture and Design of a Database Processing Unit (paper-pdf)
*Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly (paper-pdf)
Hardware support for fine-grained event-driven computation in Anton 2 (paper-pdf)
Machine Learning and Inference Accelerators
*Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks (paper-pdf)
*In-datacenter Performance Analysis of a Tensor Processing Unit (paper-pdf)
EIE: efficient inference engine on compressed deep neural network (paper-pdf)
DaDianNao: A Machine Learning Supercomputer (paper-pdf)
Reconfigurable Accelerators
*A reconfigurable fabric for accelerating large-scale datacenter services (AKA: The Catapult Paper) (paper-pdf)
*Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? (skim) (paper-pdf)
LEAP scratchpads: automatic memory and cache management for reconfigurable logic (paper-pdf)
CoRAM: an in-fabric memory architecture for FPGA-based computing (paper-pdf)
Reconfigurable Dataflow Processors
*RipTide: A programmable, energy-minimal dataflow compiler and architecture (paper-pdf)
*Stream-Dataflow Acceleration (paper-pdf)
Tiled Architectures
*Evaluation of the RAW Microprocessor: An Exposed Wire-delay Architecture for ILP and Streams (paper-pdf)
*A scalable architecture for ordered parallelism (paper-pdf)
Accelerating Irregular Computations
*P-OPT: Practical Optimal Cache Replacement for Graph Analytics (paper-pdf)
*Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures (paper-pdf)
When is Graph Reordering an Optimizaton? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs (paper-pdf)
Graphicionado: A high-performance accelerator for graph analytics (paper-pdf)
Part III: Emerging Topics
Encrypted Computing
Architectures for Encrypted Computing with Homomorphically Encrypted Data
*HEAX: An Architecture for Computing on Encrypted Data (paper-pdf)
*Client-Optimized Algorithms and Acceleration for Encrypted Compute Offloading (paper-pdf)
Intermittent Computing
Programming intermittent computers
*Clank: Architectural Support for Intermittent Computation (paper-pdf)
*A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices (paper-pdf)
A simpler, safer programming and execution model for intermittent systems (paper-pdf)
An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems (paper-pdf)
Architectural Security and Privacy
*Spectre Attacks: Exploiting Speculative Execution (paper-pdf)
*Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data (paper-pdf)
Spectre and Meltdown Google Project Zero Write-up
Mark Hill's slides on Spectre and Meltdown
Approximate Computing
*Load Value Approximation (paper-pdf)
*Neural Acceleration for General Purpose Approximate Programs (paper-pdf)
General-purpose code acceleration with limited-precision analog computation (paper-pdf)
Approximate storage in solid-state memories (paper-pdf)
DNA-based Computing and Storage
*A DNA-Based Archival Storage System (paper-pdf)
*Puddle: A Dynamic, Error-Correcting, Full-Stack Microfluidics Platform (paper-pdf)