18742: Reading List and Course Plan

(Required reading marked *)

Part I: Parallel Computer Architectures

Course Intro, Architecture Review, Amdahl's Law

Syllabus

*Cramming More Components onto Integrated Circuits (AKA: Moore's Law)

*Parallel Architectures (AKA: Flynn's Taxonomy)

*Validity of the single processor approach to achieving large scale computing capabilities (AKA: Amdahl's Law)

Design of Ion-Implanted MOSFET’S with Very Small Physical Dimensions (AKA: The 1970 Dennard Scaling Paper)

Parallel Architectures

*Multiscalar processors

*The Case for a Single-chip Multiprocessor

Parallel Execution Strategies

Dataflow Architecture

*WaveScalar

*An Evaluation of the TRIPS computer system

Dataflow execution of sequential imperative programs on multicore architectures

Writing and Executing Parallel Programs

Synchronization and Transaction Memory

Optimizing Synchronization and Transactional Memory

*Speculative lock elision: enabling highly concurrent multithreaded execution

Inferential queueing and speculative push for reducing critical communication latencies

*Transactional Memory

*(quick skim only) Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack

Making the fast case common and the uncommon case simple in unbounded transactional memory

Software Transactional Memory (Optional)

Software Transactional Memory

Software Transactional Memory: Why is it only a research toy?

Synthesis Lectures on Transactional Memory (AKA: the TM Book)

Memory Consistency Enforcement Mechanisms

Data-race-free and Speculative Models

*DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

*Transactional Memory Coherence and Consistency

*DRFx: a simple and efficient memory model for concurrent programming languages

BulkSC: bulk enforcement of sequential consistency

SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

Memory Consistency Exceptions

*Conflict Exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Valor: efficient, software-only region conflict exceptions

Architecture Support Concurrent Software Reliability

Detecting and Avoiding Concurrency Bugs (Optional)

Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

A Case for an interleaving constrained shared-memory multi-processor

AVIO: detecting atomicity violations via access interleaving invariants

Cooperative, Empirical Failure Avoidance for Multithreaded Programs

Finding Concurrency Bugs with Context-aware Communication Graphs

Flexible, Hardware Acceleration for Instruction-Grain Lifeguards

Atom-aid: detecting and surviving atomicity violations

Deterministic Execution

*A "flight data recorder" for enabling full-system multiprocessor deterministic replay

*DMP: deterministic shared memory multiprocessing

Grace: safe multithreaded programming for C/C++

CoreDet: a compiler and runtime system for deterministic multithreaded execution

The End of Moore's Law and the Beginning of the Era of Dark Silicon

Power, Energy, and Dark Silicon

*Amdahl's Law in the Multicore Era (paper-pdf)

*Dark Silicon and the End of Multicore Scaling (paper-pdf)

*Power: A First-class Architectural Design Constraint (skim) (paper-pdf)

Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures (paper-pdf)

Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors (paper-pdf)

Part II: Heterogeneity, Specialization, and Acceleration

Fused and Composable Heterogeneous Cores

*Core-fusion: accomodating software diversity in chip multiprocessors (paper-pdf)

*Composable, light-weight processors (paper-pdf)

Specialization

Accelerators for Everything

*Conservation cores: reducing the energy of mature computations (paper-pdf)

*QsCores: Trading Dark Silicon for Scalable Energy with Quasi-specific Cores (paper-pdf)

Database and Genomics Accelerators

*Q100: The Architecture and Design of a Database Processing Unit (paper-pdf)

*Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly (paper-pdf)

Hardware support for fine-grained event-driven computation in Anton 2 (paper-pdf)

Machine Learning and Inference Accelerators

*Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks (paper-pdf)

*In-datacenter Performance Analysis of a Tensor Processing Unit (paper-pdf)

EIE: efficient inference engine on compressed deep neural network (paper-pdf)

DaDianNao: A Machine Learning Supercomputer (paper-pdf)

Reconfigurable Accelerators

*A reconfigurable fabric for accelerating large-scale datacenter services (AKA: The Catapult Paper) (paper-pdf)

*Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? (skim) (paper-pdf)

LEAP scratchpads: automatic memory and cache management for reconfigurable logic (paper-pdf)

CoRAM: an in-fabric memory architecture for FPGA-based computing (paper-pdf)

Reconfigurable Dataflow Processors

*RipTide: A programmable, energy-minimal dataflow compiler and architecture (paper-pdf)

*Stream-Dataflow Acceleration (paper-pdf)

Tiled Architectures

*Evaluation of the RAW Microprocessor: An Exposed Wire-delay Architecture for ILP and Streams (paper-pdf)

*A scalable architecture for ordered parallelism (paper-pdf)

Accelerating Irregular Computations

*P-OPT: Practical Optimal Cache Replacement for Graph Analytics (paper-pdf)

*Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures (paper-pdf)

When is Graph Reordering an Optimizaton? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs (paper-pdf)

Graphicionado: A high-performance accelerator for graph analytics (paper-pdf)

Part III: Emerging Topics

Encrypted Computing

Architectures for Encrypted Computing with Homomorphically Encrypted Data

*HEAX: An Architecture for Computing on Encrypted Data (paper-pdf)

*Client-Optimized Algorithms and Acceleration for Encrypted Compute Offloading (paper-pdf)

Intermittent Computing

Programming intermittent computers

*Clank: Architectural Support for Intermittent Computation (paper-pdf)

*A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices (paper-pdf)

A simpler, safer programming and execution model for intermittent systems (paper-pdf)

An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems (paper-pdf)

Architectural Security and Privacy

*Spectre Attacks: Exploiting Speculative Execution (paper-pdf)

*Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data (paper-pdf)

Spectre and Meltdown Google Project Zero Write-up

Mark Hill's slides on Spectre and Meltdown

Approximate Computing

*Load Value Approximation (paper-pdf)

*Neural Acceleration for General Purpose Approximate Programs (paper-pdf)

General-purpose code acceleration with limited-precision analog computation (paper-pdf)

Approximate storage in solid-state memories (paper-pdf)

DNA-based Computing and Storage

*A DNA-Based Archival Storage System (paper-pdf)

*Puddle: A Dynamic, Error-Correcting, Full-Stack Microfluidics Platform (paper-pdf)

18742: Reading List and Course Plan

(Required reading marked *)

Part I: Parallel Computer Architectures

Course Intro, Architecture Review, Amdahl's Law

Parallel Architectures

Parallel Execution Strategies

Dataflow Architecture

Writing and Executing Parallel Programs

Multiprocessor Cache Coherence and Memory Consistency

Cache Coherence and Memory Consistency

Memory Consistency Models (Optional)

Synchronization and Transaction Memory

Optimizing Synchronization and Transactional Memory

Software Transactional Memory (Optional)

Memory Consistency Enforcement Mechanisms

Data-race-free and Speculative Models

Memory Consistency Exceptions

Architecture Support Concurrent Software Reliability

Detecting and Avoiding Concurrency Bugs (Optional)

Deterministic Execution

The End of Moore's Law and the Beginning of the Era of Dark Silicon

Power, Energy, and Dark Silicon

Part II: Heterogeneity, Specialization, and Acceleration

Fused and Composable Heterogeneous Cores

Specialization

Accelerators for Everything

Database and Genomics Accelerators

Machine Learning and Inference Accelerators

Reconfigurable Accelerators

Reconfigurable Dataflow Processors

Tiled Architectures

Accelerating Irregular Computations

Part III: Emerging Topics

Encrypted Computing

Architectures for Encrypted Computing with Homomorphically Encrypted Data

Intermittent Computing

Programming intermittent computers

Architectural Security and Privacy

Approximate Computing

DNA-based Computing and Storage

Brandon Lucia

Rights

Caveats