Architecture Research Term Project
You will carry out a semester-long term project on a topic based on content from this course. Your project must begin with a research question, design and implement a system and/or architecture addressing that research question, and must use the implementation to perform a quantitative evaluation. The choice of project topic is deliberately open-ended. You are expected to identify an interesting question that does not have an obvious answer, and to answer that question with your project. A possibly helpful analogy for the scale of the project is that your project should be roughly the size and complexity of a paper that you might see published at a workshop. In planning your project topic, you are encouraged to consider outside academic interests (ML, HCI, AI, PL, etc) in the context of parallel, heterogeneous, and emerging computer architectures. Your project could be an implementation of an idea that we read about in class, perhaps transplanted to a new context (e.g., deterministic execution for GPU/CPU systems), or something completely novel (e.g., an FPGA-based accelerator for neural network computer vision workloads). Your project should not be a measurement study only (e.g., "how fast is PARSEC on an Intel CPU?"); however a research project that includes a well-crafted measurement study as a contribution could be very interesting. You will have to choose a project topic before we have covered all of the topics for the semester. It will benefit you to read the abstracts and skim any papers in the reading list that seem interesting, to help identify project ideas.
Deliverables
Project Ideas
- Architecture and systems support for MCUNetv2/v3 in memory-constrained dataflow processors
- Architecture support for sparse and irregular computation in resource-constrained ultra-low-power systems
- Architecture support for time-multiplexing of processing elements in an ordered-dataflow CGRA
- Computer architecture support for encrypted dataflow processing
- Analysis and design of approximate computer architectures for in-satellite, radiation-tolerant computing
- Cache flush management strategies for intermittent energy-harvesting computers
- Memory consistency models for ordered dataflow
- Defining a memory consistency model and synchronization primitives for a processing-in-memory system
- Evaluation of performance bottlenecks in encrypted computing systems
- Intermittently powered dataflow CGRA architecture
- Software or architecture support for resource-constrained or intermittent graph processing
- Distributed, intermittent Deep Neural Network training system
- Energy-harvesting simulation infrastructure: power and performance model
- Measurement study of architectural implications of non-volatile technology as storage or logic in a CPU
- Relaxed memory consistency for FPGA/CPU SoCs
- Performance and correctness impact of approximate synchronization operations on neural network or computer vision applications
- Hardware support for data-centric synchronization / per-address memory fences
- Heterogeneous memory consistency for CPU+FPGA systems with per-FPGA-state-machine consistency guarantees
- Design and evaluation of an intermittent reconfigurable architecture
- Approximate, compressive cache, LLC or main memory
- Data-race detection or SC-violation detection in a reconfigurable computing device or heterogeneous FPGA/CPU-based system.
- Application study: precision vs. performance trade-off in a parallel system with approximate cache coherence
- Deterministic parallel computation in an FPGA
- Application study: when is it beneficial to execute code on a GPU or FPGA in parallel with execution on a CPU?
- Symbolic execution to evaluate candidate power schedules for programs running on intermittently powered devices
- 3D-stacked, processing-in-memory to accelerate garbage collection or other pointer-chasing analysis
- Deterministic transactional execution with weak isolation guarantees
- Approximate, parallel scatter/gather or reduction
- Performance and Power model and assessment of a "perpetual" solar-powered, fully-nonvolatile processor
- Using shared memory communication graphs to predict magnitude/importance of shared value updates
- Cache architecture and memory hierarchy design for heterogeneous CPU/GPU/Accelerator architecture
- Feasibility assessment and performance model of porting TensorFlow kernels to an FPGA
- Environmental impact assessment and mitigation strategy for current and future cloud machine learning
- Hardware concurrency bug detection for FPGA designs
Benchmarks
Simulators and Tools
- Sniper (easy-to-use architecture simulator)
- Gem5 (very detailed architecture simulator)
- MarssX86 (detailed architecture simulator)
- McPat (architectural power modeling)
- Cacti (power modeling)
- Pin (binary instrumentation)
- LLVM (compiler infrastructure)
- Z3 (SMT solver)
- KLEE (C/C++ symbolic execution engine)