18742: Reading List and Course Plan

(Required reading marked *)

Part I: Parallel Computer Architectures

Course Intro, Architecture Review, Amdahl's Law

Syllabus

*Cramming More Components onto Integrated Circuits (AKA: Moore's Law)

*Parallel Architectures (AKA: Flynn's Taxonomy)

*Validity of the single processor approach to achieving large scale computing capabilities (AKA: Amdahl's Law)

Design of Ion-Implanted MOSFET’S with Very Small Physical Dimensions (AKA: The 1970 Dennard Scaling Paper)

Parallel Architectures

*Multiscalar processors

*The Case for a Single-chip Multiprocessor

Parallel Execution Strategies

Dataflow Architecture

*WaveScalar

*An Evaluation of the TRIPS computer system

Dataflow execution of sequential imperative programs on multicore architectures

Evaluation of the RAW Microprocessor: An Exposed Wire-delay Architecture for ILP and Streams

Writing and Executing Parallel Programs

Synchronization and Transaction Memory

Optimizing Synchronization

*Speculative lock elision: enabling highly concurrent multithreaded execution

*Inferential queueing and speculative push for reducing critical communication latencies

Hardware Transactional Memory

*Transactional Memory

*Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack

Making the fast case common and the uncommon case simple in unbounded transactional memory

Software Transactional Memory (Optional)

Software Transactional Memory

Software Transactional Memory: Why is it only a research toy?

Synthesis Lectures on Transactional Memory (AKA: the TM Book)

Memory Consistency Enforcement Mechanisms

Data-race-free and Speculative Models

*DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

*DRFx: a simple and efficient memory model for concurrent programming languages

Transactional Memory Coherence and Consistency

BulkSC: bulk enforcement of sequential consistency

SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

Memory Consistency Exceptions

Conflict Exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Valor: efficient, software-only region conflict exceptions

Architecture Support Concurrent Software Reliability

Detecting and Avoiding Concurrency Bugs (Optional)

Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

A Case for an interleaving constrained shared-memory multi-processor

AVIO: detecting atomicity violations via access interleaving invariants

Cooperative, Empirical Failure Avoidance for Multithreaded Programs

Finding Concurrency Bugs with Context-aware Communication Graphs

Flexible, Hardware Acceleration for Instruction-Grain Lifeguards

Atom-aid: detecting and surviving atomicity violations

Deterministic Execution

*DMP: deterministic shared memory multiprocessing

*Grace: safe multithreaded programming for C/C++

CoreDet: a compiler and runtime system for deterministic multithreaded execution

A "flight data recorder" for enabling full-system multiprocessor deterministic replay

The End of Moore's Law and the Beginning of the Era of Dark Silicon

Power, Energy, and Dark Silicon

*Amdahl's Law in the Multicore Era

*Dark Silicon and the End of Multicore Scaling

*Power: A First-class Architectural Design Constraint (skim)

Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures

Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Part II: Heterogeneity, Specialization, and Acceleration

Fused and Composable Heterogeneous Cores

*Core-fusion: accomodating software diversity in chip multiprocessors

*Composable, light-weight processors

Specialization

Accelerators for Everything

*Conservation cores: reducing the energy of mature computations

*QsCores: Trading Dark Silicon for Scalable Energy with Quasi-specific Cores

Database and Genomics Accelerators

*Q100: The Architecture and Design of a Database Processing Unit

*Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly

Hardware support for fine-grained event-driven computation in Anton 2

Machine Learning and Inference Accelerators

*Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks

*In-datacenter Performance Analysis of a Tensor Processing Unit

EIE: efficient inference engine on compressed deep neural network

DaDianNao: A Machine Learning Supercomputer

Reconfigurable Accelerators

*A reconfigurable fabric for accelerating large-scale datacenter services (AKA: The Catapult Paper)

*Stream-Dataflow Acceleration

*Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? (skim)

LEAP scratchpads: automatic memory and cache management for reconfigurable logic

CoRAM: an in-fabric memory architecture for FPGA-based computing

Accelerating Irregular Computations

*P-OPT: Practical Optimal Cache Replacement for Graph Analytics

When is Graph Reordering an Optimizaton? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs

*A scalable architecture for ordered parallelism

Graphicionado: A high-performance accelerator for graph analytics

Part III: Emerging Topics

Encrypted Computing

Architectures for Encrypted Computing with Homomorphically Encrypted Data

*HEAX: An Architecture for Computing on Encrypted Data

Intermittent Computing

Programming intermittent computers

*Architecture exploration for ambient energy harvesting nonvolatile processors

*A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices

A simpler, safer programming and execution model for intermittent systems

An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems

Architectural Security and Privacy

*Spectre Attacks: Exploiting Speculative Execution

*Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data

Spectre and Meltdown Google Project Zero Write-up

Mark Hill's slides on Spectre and Meltdown

Approximate Computing

*Load Value Approximation

*Neural Acceleration for General Purpose Approximate Programs

General-purpose code acceleration with limited-precision analog computation

Approximate storage in solid-state memories

DNA-based Computing and Storage

*A DNA-Based Archival Storage System

*Neural Network Computation with DNA Strand Displacement Cascades

18742: Reading List and Course Plan

(Required reading marked *)

Part I: Parallel Computer Architectures

Course Intro, Architecture Review, Amdahl's Law

Parallel Architectures

Parallel Execution Strategies

Dataflow Architecture

Writing and Executing Parallel Programs

Lecture: Multiprocessor Cache Coherence and Memory Consistency

Cache Coherence and Memory Consistency

Memory Consistency Models (Optional)

Synchronization and Transaction Memory

Optimizing Synchronization

Hardware Transactional Memory

Software Transactional Memory (Optional)

Memory Consistency Enforcement Mechanisms

Data-race-free and Speculative Models

Memory Consistency Exceptions

Architecture Support Concurrent Software Reliability

Detecting and Avoiding Concurrency Bugs (Optional)

Deterministic Execution

The End of Moore's Law and the Beginning of the Era of Dark Silicon

Power, Energy, and Dark Silicon

Part II: Heterogeneity, Specialization, and Acceleration

Fused and Composable Heterogeneous Cores

Specialization

Accelerators for Everything

Database and Genomics Accelerators

Machine Learning and Inference Accelerators

Reconfigurable Accelerators

Accelerating Irregular Computations

Part III: Emerging Topics

Encrypted Computing

Architectures for Encrypted Computing with Homomorphically Encrypted Data

Intermittent Computing

Programming intermittent computers

Architectural Security and Privacy

Approximate Computing

DNA-based Computing and Storage

Brandon Lucia

Rights

Caveats