18742: Reading List and Course Plan

(Required reading marked *)

Part I: Parallel Computer Architectures

Course Intro, Architecture Review, Amdahl's Law

Syllabus

*Cramming More Components onto Integrated Circuits (AKA: Moore's Law)

*Parallel Architectures (AKA: Flynn's Taxonomy)

*Validity of the single processor approach to achieving large scale computing capabilities (AKA: Amdahl's Law)

Parallel Architectures [slides]

*Multiscalar processors

*The Case for a Single-chip Multiprocessor

Parallel Execution Strategies

Dataflow and Tiled Architectures [slides]

*WaveScalar

*An Evaluation of the TRIPS computer system

Dataflow execution of sequential imperative programs on multicore architectures

Evaluation of the RAW Microprocessor: An Exposed Wire-delay Architecture for ILP and Streams

Throughput Computing [slides]

*Larrabee: a many-core x86 architecture for visual computing

*Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Writing and Executing Parallel Programs

Synchronization and Transaction Memory

Optimizing Synchronization[slides]

*Speculative lock elision: enabling highly concurrent multithreaded execution

*Inferential queueing and speculative push for reducing critical communication latencies

Hardware Transactional Memory [slides]

*Transactional Memory

*Making the fast case common and the uncommon case simple in unbounded transactional memory

Hardware Transactional Memory Implementations

*Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack

*Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

Software Transactional Memory

Software Transactional Memory: Why is it only a research toy?

Synthesis Lectures on Transactional Memory (AKA: the TM Book)

Memory Consistency Enforcement Mechanisms

Data-race-free and Speculative Models [slides]

*DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

*Transactional Memory Coherence and Consistency

BulkSC: bulk enforcement of sequential consistency

SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

Memory Consistency Exceptions [slides]

*Conflict Exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

*DRFx: a simple and efficient memory model for concurrent programming languages

Valor: efficient, software-only region conflict exceptions

Architecture Support Concurrent Software Reliability

Detecting and Avoiding Concurrency Bugs [slides]

*Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

*A Case for an interleaving constrained shared-memory multi-processor

AVIO: detecting atomicity violations via access interleaving invariants

Cooperative, Empirical Failure Avoidance for Multithreaded Programs

Finding Concurrency Bugs with Context-aware Communication Graphs

Flexible, Hardware Acceleration for Instruction-Grain Lifeguards

Atom-aid: detecting and surviving atomicity violations

Deterministic Execution [slides]

*DMP: deterministic shared memory multiprocessing

*Grace: safe multithreaded programming for C/C++

CoreDet: a compiler and runtime system for deterministic multithreaded execution

A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Power and Energy

Energy Modeling, Profiling, Analysis [slides]

*Power: A First-class Architectural Design Constraint

*Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures

Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Dark Silicon: The beginning of the end [slides]

*Amdahl's Law in the Multicore Era

*Dark Silicon and the End of Multicore Scaling

Design of Ion-Implanted MOSFET’S with Very Small Physical Dimensions (AKA: The 1970 Dennard Scaling Paper)

Part II: Heterogeneity, Specialization, and Acceleration

Fused and Composable Heterogeneous Cores [slides]

*Core-fusion: accomodating software diversity in chip multiprocessors

*Composable, light-weight processors

Specialization

Accelerators for Everything [slides]

*Conservation cores: reducing the energy of mature computations

*QsCores: Trading Dark Silicon for Scalable Energy with Quasi-specific Cores

Accelerating Irregular Computations

*Graphicionado: A high-performance accelerator for graph analytics

*A scalable architecture for ordered parallelism

Hyper-optimized Application-specific Accelerators

*Q100: The Architecture and Design of a Database Processing Unit

*Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly

Hardware support for fine-grained event-driven computation in Anton 2

Machine Learning Accelerators[slides]

*In-datacenter Performance Analysis of a Tensor Processing Unit

*DaDianNao: A Machine Learning Supercomputer

Neural Network Inference Accelerators [slides]

*EIE: efficient inference engine on compressed deep neural network

*Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks

Reconfigurable Accelerators

*A reconfigurable fabric for accelerating large-scale datacenter services (AKA: The Catapult Paper)

*LEAP scratchpads: automatic memory and cache management for reconfigurable logic

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

CoRAM: an in-fabric memory architecture for FPGA-based computing

Part III: Emerging Topics

Intermittent Computing

Lecture: Programming intermittent computers

*Architecture exploration for ambient energy harvesting nonvolatile processors

*A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices

A simpler, safer programming and execution model for intermittent systems

An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems

Architectural Security and Privacy[slides]

*Understanding Contention-Based Channels and Using Them for Defense"

*Spectre Attacks: Exploiting Speculative Execution

Spectre and Meltdown Google Project Zero Write-up

Mark Hill's slides on Spectre and Meltdown

Approximate Computing [slides]

*Load Value Approximation

*Neural Acceleration for General Purpose Approximate Programs

General-purpose code acceleration with limited-precision analog computation

Approximate storage in solid-state memories

DNA-based Computing and Storage [slides]

*A DNA-Based Archival Storage System

*Neural Network Computation with DNA Strand Displacement Cascades

18742: Reading List and Course Plan

(Required reading marked *)

Part I: Parallel Computer Architectures

Course Intro, Architecture Review, Amdahl's Law

Parallel Architectures [slides]

Parallel Execution Strategies

Dataflow and Tiled Architectures [slides]

Throughput Computing [slides]

Writing and Executing Parallel Programs

Lecture: Parallel programming overview

Cache Coherence and Memory Consistency [slides]

Memory Consistency Models

Synchronization and Transaction Memory

Optimizing Synchronization[slides]

Hardware Transactional Memory [slides]

Hardware Transactional Memory Implementations

Software Transactional Memory

Memory Consistency Enforcement Mechanisms

Data-race-free and Speculative Models [slides]

Memory Consistency Exceptions [slides]

Architecture Support Concurrent Software Reliability

Detecting and Avoiding Concurrency Bugs [slides]

Deterministic Execution [slides]

Power and Energy

Energy Modeling, Profiling, Analysis [slides]

Dark Silicon: The beginning of the end [slides]

Part II: Heterogeneity, Specialization, and Acceleration

Fused and Composable Heterogeneous Cores [slides]

Specialization

Accelerators for Everything [slides]

Accelerating Irregular Computations

Hyper-optimized Application-specific Accelerators

Machine Learning Accelerators[slides]

Neural Network Inference Accelerators [slides]

Reconfigurable Accelerators

Part III: Emerging Topics

Intermittent Computing

Lecture: Programming intermittent computers

Architectural Security and Privacy[slides]

Approximate Computing [slides]

DNA-based Computing and Storage [slides]

Brandon Lucia

Rights

Caveats