===== Paper Reviews and Discussion =====
Post your reviews for the required readings in the paper review system.

===== For Lecture 1 =====
== Required Readings ==
  * {{:lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS Conference, April 1967.}}
  * {{:lecture1-moore.1965.electronics.pdf|G. E. Moore, "Cramming more components onto integrated circuits," Electronics, April 1965.}}
  * {{:lecture1-ronen.2001.ieee.pdf|Ronen et al., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, vol. 89, no. 11, 2001.}}
  * {{:lecture1-requirementsbottlenecksandgoodfortune-patt.pdf|Y. N. Patt, "Requirements, bottlenecks, and good fortune: agents for microprocessor evolution," Proceedings of the IEEE, vol. 89, no. 11, 2001.}}


===== For Lecture 2 =====
== Required Reading ==
  * {{instructionssetsandbeyond.pdf|Colwell et al., "Instruction Sets and Beyond: Computers, Complexity, and Controversy," Computer, September 1985.}} 

== Suggested Readings ==
On-Chip Networks
  * {{routepacketsnotwires.pdf|Dally et al., "Route Packets, Not Wires: On-Chip Interconnection Networks," DAC, June 2001.}}  
  * {{onchipinterconnectionarchitectureoftileprocessor.pdf|Wentzlaff et al., "On-Chip Interconnection Architecture of the Tile Processor," Micro, IEEE, November 2007.}}    
  * {{preemptivevirtualclock.pdf|Grot et al., "Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-a-Chip," Micro, December 2009.}}     

Main Memory Controllers
  * {{memoryperformanceattacks.pdf|Moscibroda et al., "Memory performance attacks: Denial of memory service  in multi-core systems," Usenix Security Symposium, 2007.}}
  * {{memoryaccessscheduling.pdf|Rixner et al., "Memory Access Scheduling," ISCA, 2000.}}

Architecture Reference Manuals
  * [[http://www.bitsavers.org/pdf/dec/vax/VAX_archHbkVol1_1977.pdf|Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78]]
  * [[http://www.intel.com/products/processor/manuals/|Intel Corp. “Intel 64 and IA-32 Architectures Software Developer’s Manual”]]

Compilers
  * {{compilersandcomputerarchitecture.pdf|Wulf, "Compilers and Computer Architecture," IEEE Computer, 1981.}}

===== For Lecture 3 =====
== Required Reading ==
  * {{TransactionalMemory.pdf|Herlihy et al., "Transactional Memory: Architectural Support for Lock-free Data Structures," ISCA, 1993.}}

== Suggested Readings ==
  * {{electroniccomputingvonneumann.pdf|, "Preliminary discussion of the logical design of an electronic computing instrument," Institute for Advanced Study , 1946.}}
  
===== For Lecture 4 =====
== Required Reading ==
  * {{pipelinedmimdcomputer.pdf|Burton Smith, "A pipelined, shared resource MIMD computer," ICPP 1978.}}
  * {{onchipopticaltechnology.pdf|Kirman et al., "On-Chip Optical Technology in Future Bus-Based Multicore Designs," IEEE Micro Top Picks 2007.}}

== Suggested Readings ==
  * {{predictabilityofdatavalues.pdf|Yiannakis Sazeides, James E. Smith: "The Predictability of Data Values," MICRO 1997: 248-258}}
  * {{valuelocalityandloadvalueprediction.pdf|Mikko H. Lipasti, Christopher B. Wilkerson, John Paul Shen: "Value Locality and Load Value Prediction," ASPLOS 1996: 138-147}}

===== For Lecture 5 =====
== Required Reading ==
  * {{implementingpreciseinterrupts.pdf|Smith and Plezskun, “Implementing Precise Interrupts in Pipelined Processors,” IEEE Trans on Computers 1988 and ISCA 1985}}
  * {{microarchitectureofsuperscalar.pdf|Smith and Sohi, "The Microarchitecture of Superscalar Processors," Proc IEEE 1995}}

== Suggested Readings ==
  * {{checkpointrepairforoutoforder.pdf|Hwu and Patt, "Checkpoint Repair for Out-of-order Execution Machines," ISCA 1987}}


===== For Lecture 6 =====
== Required Reading ==
  * {{virtualmemory.pdf|Jacob and Mudge, "Virtual Memory in Contemporary Microprocessors," IEEE Micro, vol. 18, no. 4, 1998}}

===== For Lecture 7 =====
  * Hennessy and Patterson, Sections 2.1-2.10 (inclusive)
== Modern Designs - Required Readings==
  * {{onpipeliningdynamicinstructionschedulinglogic.pdf|Stark, Brown, Patt, “On pipelining dynamic instruction scheduling logic,” MICRO 2000}}
  * {{Themicroarchitectureofthepentium4processor.pdf|Boggs et al., “The microarchitecture of the Pentium 4 processor,” Intel Technology Journal, 2001}}
  * {{21264microprocessor.pdf|Kessler, “The Alpha 21264 microprocessor,” IEEE Micro, March-April 1999}}
  * {{themipsr10000superscalarmicroprocessor.pdf|Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, April 1996}}

== Seminal Papers - Recommended Readings ==
  * {{hps.pdf|Patt, Hwu, Shebanow, “HPS, a new microarchitecture: rationale and introduction,” MICRO 1985}}
  * {{criticalissuesregardinghps.pdf|Patt et al., “Critical issues regarding HPS, a high performance microarchitecture,” MICRO 1985}}
  * {{ibmsystem360.pdf|Anderson, Sparacio, Tomasulo, “The IBM System/360 Model 91: Machine Philosophy and Instruction Handling,” IBM Journal of R&D, Jan. 1967}}
  * {{anefficientalgorithmforexploitingmultiplearithmeticunits.pdf|Tomasulo, “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of R&D, Jan. 1967}}

===== For Lecture 9 =====
== Required Readings ==
  * {{improvingdirectmappedcacheperformance.pdf|Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990}}
  * {{acaseformlpawarecachereplacement.pdf|Qureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006}}
  * {{slavememoriesanddynamicstorageallocation.pdf|Wilkes, “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. On Electronic Computers, 1965}}

===== For Lecture 10 =====
== Required Readings ==
  * {{runaheadexecution.pdf|Mutlu et al., "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," HPCA 2003}}
  * {{dualcoreexecution.pdf|Zhou, Dual-Core Execution: "Building a Highly Scalable Single-Thread Instruction Window," PACT 2005}}
  * {{efficientrunaheadexecution.pdf|Mutlu et al., "Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance," IEEE Micro Top Picks 2006}}

== Suggested Readings ==

  * {{memorydependenceprediction.pdf|Chrysos and Emer, "Memory Dependence Prediction Using Store Sets," ISCA 1998}}


===== For Lecture 11 =====
== Required Readings ==
  * Hennessy and Patterson, Appendix C.1-C.3
  * {{improvingdirectmappedcacheperformance.pdf|Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990}}
  * {{acaseformlpawarecachereplacement.pdf|Qureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006}}

== Suggested Readings ==
  * {{twowayskewedassociativecaches.pdf|Seznec, "A Case for Two-way Skewed Associative Caches," ISCA 1993}}
  * {{cacheconsciousstructuredefinition.pdf|Chilimbi et al., "Cache-conscious Structure Definition," PLDI 1999}}
  * {{cacheconsciousstructurelayout.pdf|Chilimbi et al., "Cache-conscious Structure Layout," PLDI 1999}}


===== For Lecture 13 =====
== Required Readings ==
  * {{codetransformationsformlp.pdf|Pai et al., "Code Transformations to Improve Memory Parallelism," MICRO 1999}}

== Recommended Readings ==
  * {{datacachesforsuperscalar.pdf|Juan et al., "Data Caches for Superscalar Processors," ICS 1997}}

===== For Lecture 14 =====
== Required Readings ==
  * {{markovpredictors.pdf|Joseph and Grunwald, "Prefetching using Markov Predictors,' ISCA 1997}}

== Recommended Readings ==
  * {{compileralgorithmforprefetching.pdf| Mowry et al., "Design and Evaluation of a Compiler Algorithm for Prefetching," ASPLOS 1992}}
  * {{feedbackdirectedprefetching.pdf| Srinath et al., "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers", HPCA 2007}}
  * {{runaheadexecution.pdf|Mutlu et al., "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," HPCA 2003}}

===== For Lecture 15 =====
Same as previous lecture

===== For Lecture 16 =====
== Recommended Readings ==
  * {{statelesscontentdirectedprefetching.pdf|Cooksey et al., "A stateless, content-directed data prefetching mechanism," ASPLOS 2002}}
  * {{bandwidthefficientprefetching.pdf|Ebrahimi et al., "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," HPCA 2009}}
  * {{softwarecontrolledpreexecution.pdf|Luk, "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors," ISCA 2001}}


===== Guest Lecture by Thomas Moscibroda =====
== Recommended Readings ==
  * {{bless.pdf|Moscibroda and Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}}
  * {{appawareprioritizationmechanismfornocs.pdf|Das et al., "Application-Aware Prioritization Mechanism for On-Chip Networks", MICRO 2009}}
  * {{aergia.pdf|Das et al. "Aergia: Exploiting Packet-Latency Slack in On-Chip Networks", ISCA 2010}}
  * {{nextgenerationnoc.pdf|Nychis et al., "Next Generation On-Chip Networks: What Kind of Congestion Control do we Need?", Hotnets 2010}}


===== For Lecture 17 ===== 
== Recommended Readings ==
  * {{coordinatedprefetchermanagement.pdf|Ebrahimi et al., "Coordinated Management of Multiple Prefetchers in Multi-Core Systems," MICRO 2009}}

===== For Lecture 18 ===== 
== Required Readings ==
  * {{utilitybasedcachepartitioning.pdf|Qureshi and Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," MICRO 2006}}

== Recommended Readings ==
  * {{gaininginsightsintocachepartitioning.pdf|Lin et al., "Gaining Insights into Multi-Core Cache Partitioning:Bridging the Gap between Simulation and Real Systems," HPCA 2008}}
  * {{adaptiveinsertionpolicies.pdf|Qureshi et al., "Adaptive Insertion Policies for High-Performance Caching," ISCA 2007}}


===== For Lecture 19 =====
== Required Readings ==
  * {{parbs.pdf|Mutlu and Moscibroda, "Parallelism-Aware Batch Scheduling:Enabling High-Performance and Fair Memory Controllers," IEEE Micro Top Picks 2009}}
  * {{stalltimefairmemoryscheduling.pdf| Mutlu and Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," MICRO 2007}}

== Recommended Readings ==
  * {{permutationbasedpageinterleaving.pdf|Zhang et al., "A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality," MICRO 2000}}
  * {{prefetchawaredramcontrollers.pdf|Lee et al., "Prefetch-Aware DRAM Controllers," MICRO 2008}}
  * {{memoryaccessscheduling.pdf|Rixner et al., "Memory Access Scheduling," ISCA 2000}}

===== For Lecture 20 =====
Same as previous lecture

===== For Lecture 21 =====
== Required Readings ==
  * {{evaluationoftracecachefetchmechanisms.pdf|Patel et al., "Evaluation of design options for the trace cache fetch mechanism," IEEE TC 1999}}
  * {{complexityeffectivesuperscalar.pdf|Palacharla et al., "Complexity Effective Superscalar Processors," ISCA 1997}}

== Required Readings (old) ==
  * {{microarchitectureofsuperscalar.pdf|Smith and Sohi, "The Microarchitecture of Superscalar Processors," Proc IEEE 1995}}
  * {{onpipeliningdynamicinstructionschedulinglogic.pdf|Stark, Brown, Patt, "On pipelining dynamic instruction scheduling logic," MICRO 2000}}
  * {{Themicroarchitectureofthepentium4processor.pdf|Boggs et al., "The microarchitecture of the Pentium 4 processor," Intel Technology Journal, 2001}}
  * {{21264microprocessor.pdf|Kessler, "The Alpha 21264 microprocessor," IEEE Micro, March-April 1999}}

== Recommended Readings ==
  * {{tracecache.pdf|Rotenberg et al., "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," MICRO 1996}}
  
===== For Lecture 21 =====
Same as previous lecture

===== For Lecture 22 =====
Same as previous lecture

===== For Lecture 23 =====
Same as previous lecture

===== For Lecture 24 =====
== Required Readings ==
  * {{conbiningbranchpredictors.pdf|McFarling, "Combining Branch Predictors," DEC WRL TR, 1993}}
  * {{increasingprocessorperformance.pdf|Carmean and Sprangle, "Increasing Processor Performance by Implementing Deeper Pipelines," ISCA 2002}}

== Recommended Readings ==
  * {{analysisofcorrelationandpredictability.pdf|Evers et al., "An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work," ISCA 1998}}
  * {{alternativeimplementationoftwolevelbp.pdf|Yeh and Patt, "Alternative Implementations of Two-Level Adaptive Branch Prediction," ISCA 1992}}
  * {{availableilpforsuperscalar.pdf|Jouppi and Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," ASPLOS 1989}}
  * {{divergemergeprocessors.pdf|Kim et al., "Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths," MICRO 2006}}
  * {{dynamicbranchpredictionwithperceptrons.pdf|Jimenez and Lin, "Dynamic Branch Prediction with Perceptrons," HPCA 2001}}

===== For Lecture 25 =====
Same as previous lecture

===== For Lecture 26 =====

=== Control Flow III ===

== Recommended Readings ==
  * {{wishbranches.pdf|Kim et al., "Wish Branches: Enabling Adaptive and Aggressive Predicated Execution," IEEE Micro Top Picks, Jan/Feb 2006}}
  * {{divergemergeprocessors.pdf|Kim et al., "Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication," IEEE Micro Top Picks, Jan/Feb 2007}}
  
=== Alternative Approaches to Concurrency ===
== Required Readings ==
  * {{vliweli.pdf|Fisher, "Very Long Instruction Word architectures and the ELI-512," ISCA 1983}}
  * {{introducingia64.pdf|Huck et al., "Introducing the IA-64 Architecture," IEEE Micro 2000}}

== Recommended Readings ==
  * {{cray1computersystem.pdf|Russell, "The CRAY-1 computer system," CACM 1978}}
  * {{ilpprocessing.pdf|Rau and Fisher, "Instruction-level parallel processing: history,overview, and perspective," Journal of Supercomputing, 1993}}
  * {{instructionschedulingforilpprocessors.pdf|Faraboschi et al., "Instruction Scheduling for Instruction Level Parallel Processors," Proc. IEEE, Nov. 2001}}

===== For Lecture 26 =====
Same as previous lecture (Alternative Approaches to Concurrency)

===== For Lecture 27 =====
== Required Readings ==
  * {{nvidiatesla.pdf|Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro 2008}}
  * {{cray1computersystem.pdf|Russell, "The CRAY-1 computer system," CACM 1978}}

== Recommended Readings ==
  * {{dynamicwarpformation.pdf|Fung et al., "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," MICRO 2007}}
  * {{qilin.pdf|Luk et al., "Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping," MICRO 2009}}