Differences

This shows you the differences between two versions of the page.

Link to this comparison view

readings [2012/10/24 21:09]
hanbiny
readings [2014/09/02 03:31] (current)
Line 1: Line 1:
 =====Readings===== =====Readings=====
 +
  
 =====Lecture 1===== =====Lecture 1=====
 Required: Required:
-    * Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture. {{:reading_hill_551_560.pdf|pdf}} +  * Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture. {{:reading_hill_551_560.pdf|pdf}} 
-    * Hill, Jouppi, Sohi, “Dataflow and Multithreading,” pp. 309-314 in Readings in Computer Architecture. {{:reading_hill_309_314.pdf|pdf}} +  * Hill, Jouppi, Sohi, “Dataflow and Multithreading,” pp. 309-314 in Readings in Computer Architecture. {{:reading_hill_309_314.pdf|pdf}} 
-    * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} +  * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} 
-    * Culler & Singh, Chapter 1 +  * Culler & Singh, Chapter 1 
-    * Hamming, “You and Your Research,” Bell Communications Research Colloquium Seminar, 7 March 1986. {{http://www.cs.virginia.edu/~robins/YouAndYourResearch.html|here}}+  * Hamming, “You and Your Research,” Bell Communications Research Colloquium Seminar, 7 March 1986. {{http://www.cs.virginia.edu/~robins/YouAndYourResearch.html|here}}
  
 Optional: Optional:
-    * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} +  * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} 
-    * Kumar et al., “Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors,” ISCA 2007. {{:kumar07-carbon.pdf|pdf}}+  * Kumar et al., “Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors,” ISCA 2007. {{:kumar07-carbon.pdf|pdf}}
  
 Supplementary Readings on Research, Writing, Reviews: Supplementary Readings on Research, Writing, Reviews:
-    * Levin and Redell, “How (and how not) to write a good systems paper,” OSR 1983. {{:systemspaper_levin.pdf|pdf}} +  * Levin and Redell, “How (and how not) to write a good systems paper,” OSR 1983. {{:systemspaper_levin.pdf|pdf}} 
-    * Smith, “The Task of the Referee,” IEEE Computer 1990. {{:smith90-referee.pdf|pdf}} +  * Smith, “The Task of the Referee,” IEEE Computer 1990. {{:smith90-referee.pdf|pdf}} 
-    * SP Jones, “How to Write a Great Research Paper” {{:jones04-writing-a-paper-slides.pdf|pdf}} +  * SP Jones, “How to Write a Great Research Paper”{{:jones04-writing-a-paper-slides.pdf|pdf}} 
-    * Fong, “How to Write a CS Research Paper: A Bibliography” {{:fong06-writing-papers.pdf|pdf}}+  * Fong, “How to Write a CS Research Paper: A Bibliography”{{:fong06-writing-papers.pdf|pdf}}
  
 =====Lecture 2===== =====Lecture 2=====
 Required: Required:
-    * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} +  * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} 
-    * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} +  * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} 
-    * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} +  * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} 
-    * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} +  * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} 
-    * Ipek et al., “Core Fusion: Accommodating Software Diversity in Chip Multiprocessors,” ISCA 2007. {{:ipek07-fusion.pdf|pdf}}+  * Ipek et al., “Core Fusion: Accommodating Software Diversity in Chip Multiprocessors,” ISCA 2007. {{:ipek07-fusion.pdf|pdf}}
  
 Optional: Optional:
-    * Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. {{:flynn66_computing.pdf|pdf}} +  * Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. {{:flynn66_computing.pdf|pdf}} 
-    * Thornton, “CDC 6600: Design of a Computer,” 1970. {{:thornton_cdc6600.pdf|pdf}} +  * Thornton, “CDC 6600: Design of a Computer,” 1970. {{:thornton_cdc6600.pdf|pdf}} 
-    * Burton Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978. {{:smith78_hep.pdf|pdf}} +  * Burton Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978. {{:smith78_hep.pdf|pdf}} 
-    * Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” AFIPS 1967. {{:amdahl67_singleproc.pdf|pdf}} +  * Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” AFIPS 1967. {{:amdahl67_singleproc.pdf|pdf}} 
-    * Eyerman and Eeckhout, “Modeling critical sections in Amdahl's law and its implications for multicore design,” ISCA 2010. {{:eyerman_critsectamdahl.pdf|pdf}} +  * Eyerman and Eeckhout, “Modeling critical sections in Amdahl's law and its implications for multicore design,” ISCA 2010. {{:eyerman_critsectamdahl.pdf|pdf}} 
-    * Suleman et al., “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs,” ASPLOS 2008. {{:suleman_feedback.pdf|pdf}}+  * Suleman et al., “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs,” ASPLOS 2008. {{:suleman_feedback.pdf|pdf}}
  
 =====Lecture 3===== =====Lecture 3=====
 Required: Required:
-    * Hillis and Tucker, "The CM-5 Connection Machine: a scalable supercomputer," CACM 1993. {{:hillis_cm5.pdf|pdf}} +  * Hillis and Tucker, "The CM-5 Connection Machine: a scalable supercomputer," CACM 1993. {{:hillis_cm5.pdf|pdf}} 
-    * Seitz, "The Cosmic Cube," CACM 1985. {{:seitz_cosmiccube.pdf|pdf}}+  * Seitz, "The Cosmic Cube," CACM 1985. {{:seitz_cosmiccube.pdf|pdf}}
  
 Optional: Optional:
-    * Li and Hudak, "Memory Coherence in Shared Virtual Memory Systems, " ACM TOCS 1989. {{:li_coherencesharedmem.pdf|pdf}} +  * Li and Hudak, "Memory Coherence in Shared Virtual Memory Systems, " ACM TOCS 1989. {{:li_coherencesharedmem.pdf|pdf}} 
-    * Batcher, "Architecture of a massively parallel processor," ISCA 1980. {{:batcher_massparproc.pdf|pdf}} +  * Batcher, "Architecture of a massively parallel processor," ISCA 1980. {{:batcher_massparproc.pdf|pdf}} 
-    * Tucker and Robertson, "Architecture and Applications of the Connection Machine," IEEE Computer 1988. {{:tucker_connection.pdf|pdf}}+  * Tucker and Robertson, "Architecture and Applications of the Connection Machine," IEEE Computer 1988. {{:tucker_connection.pdf|pdf}}
  
 =====Lecture 4===== =====Lecture 4=====
 Optional: Optional:
-    * Moore, "Cramming more components onto integrated circuits," Electronics, 1965. {{:r1_moore.pdf|pdf}} +  * Moore, "Cramming more components onto integrated circuits," Electronics, 1965. {{:r1_moore.pdf|pdf}} 
-    * Stark, "On pipelining dynamic instruction scheduling logic," MICRO 2000. {{:stark00-scheduling.pdf|pdf}} +  * Stark, "On pipelining dynamic instruction scheduling logic," MICRO 2000. {{:stark00-scheduling.pdf|pdf}} 
-    * Olukotun et al., "The Case for a Single-Chip Multiprocessor," ASPLOS 1996. {{:olukutun96_cmp.pdf|pdf}} +  * Olukotun et al., "The Case for a Single-Chip Multiprocessor," ASPLOS 1996. {{:olukutun96_cmp.pdf|pdf}} 
-    * Kessler, "The Alpha 21264 Microprocessor," IEEE Micro 1999. {{:kessler99-alpha21264.pdf|pdf}} +  * Kessler, "The Alpha 21264 Microprocessor," IEEE Micro 1999. {{:kessler99-alpha21264.pdf|pdf}} 
-    * Palacharla et al., "Complexity-effective superscalar processors," ISCA 1997. {{:palacharla97-complexity.pdf|pdf}}+  * Palacharla et al., "Complexity-effective superscalar processors," ISCA 1997. {{:palacharla97-complexity.pdf|pdf}}
  
 =====Lecture 5===== =====Lecture 5=====
 Optional: Optional:
-    * Smith, "A pipelined, shared resource MIMD computer," ICPP 1978. {{:smith78_hep.pdf|pdf}} +  * Smith, "A pipelined, shared resource MIMD computer," ICPP 1978. {{:smith78_hep.pdf|pdf}} 
-    * Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," ISCA 2000. {{:barroso00_piranha.pdf|pdf}} +  * Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," ISCA 2000. {{:barroso00_piranha.pdf|pdf}} 
-    * Barroso et al., "Memory system characterization of commercial workloads," ISCA 1998. {{:barroso98-workloads.pdf|pdf}} +  * Barroso et al., "Memory system characterization of commercial workloads," ISCA 1998. {{:barroso98-workloads.pdf|pdf}} 
-    * Ranganathan et al., "Performance of database workloads on shared-memory systems with out-of-order processors," ASPLOS 1998. {{:ranganathan98-workloads.pdf|pdf}} +  * Ranganathan et al., "Performance of database workloads on shared-memory systems with out-of-order processors," ASPLOS 1998. {{:ranganathan98-workloads.pdf|pdf}} 
-    * Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro 2005. {{:kongetira05_niagara.pdf|pdf}} +  * Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro 2005. {{:kongetira05_niagara.pdf|pdf}} 
-    * Spracklen and Abraham, “Chip Multithreading: Opportunities and Challenges,” HPCA Industrial Session, 2005. {{:spracklen05_mt.pdf|pdf}} +  * Spracklen and Abraham, “Chip Multithreading: Opportunities and Challenges,” HPCA Industrial Session, 2005. {{:spracklen05_mt.pdf|pdf}} 
-    * Chaudhry et al., “Rock: A High-Performance Sparc CMT Processor,” IEEE Micro, 2009. {{:chaudhry_rock.pdf|pdf}} +  * Chaudhry et al., “Rock: A High-Performance Sparc CMT Processor,” IEEE Micro, 2009. {{:chaudhry_rock.pdf|pdf}} 
-    * Chaudhry et al., “Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor,” ISCA 2009. {{:chaudhry_specthread.pdf|pdf}} +  * Chaudhry et al., “Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor,” ISCA 2009. {{:chaudhry_specthread.pdf|pdf}} 
-    * Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003. {{:mutlu_runahead.pdf|pdf}} +  * Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003. {{:mutlu_runahead.pdf|pdf}} 
-    * Mutlu et al., “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” IEEE Micro Jan/Feb 2006. {{:mutlu06_efficient.pdf|pdf}} +  * Mutlu et al., “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” IEEE Micro Jan/Feb 2006. {{:mutlu06_efficient.pdf|pdf}} 
-    * Tendler et al., "POWER4 system microarchitecture," IBM J R&D, 2002. {{:tendler_power4.pdf|pdf}} +  * Tendler et al., "POWER4 system microarchitecture," IBM J R&D, 2002. {{:tendler_power4.pdf|pdf}} 
-    * Kalla et al., "IBM Power5 Chip: A Dual-Core Multithreaded Processor," IEEE Micro 2004. {{:kalla04_power5.pdf|pdf}} +  * Kalla et al., "IBM Power5 Chip: A Dual-Core Multithreaded Processor," IEEE Micro 2004. {{:kalla04_power5.pdf|pdf}} 
-    * Le et al., "IBM POWER6 Microarchitecture," IBM J R&D, 2007. {{:le_power6.pdf|pdf}} +  * Le et al., "IBM POWER6 Microarchitecture," IBM J R&D, 2007. {{:le_power6.pdf|pdf}} 
-    * Kalla et al., "Power7: IBM’s Next-Generation Server Processor," IEEE Micro 2010. {{:kalla_power7.pdf|pdf}} +  * Kalla et al., "Power7: IBM’s Next-Generation Server Processor," IEEE Micro 2010. {{:kalla_power7.pdf|pdf}} 
-    * Grochowski et al., "Best of both Latency and Throughput," ICCD 2004. {{:grochowski_latthrough.pdf|pdf}} +  * Grochowski et al., "Best of both Latency and Throughput," ICCD 2004. {{:grochowski_latthrough.pdf|pdf}} 
-    * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} +  * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} 
-    * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}}+  * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}}
  
 =====Lecture 6===== =====Lecture 6=====
 Recommended: Recommended:
-    * Ipek et al., "Core Fusion: Accomodating Software Diversity in Chip Multiprocessors," ISCA 2007. {{:ipek07-fusion.pdf|pdf}} +  * Ipek et al., "Core Fusion: Accomodating Software Diversity in Chip Multiprocessors," ISCA 2007. {{:ipek07-fusion.pdf|pdf}} 
-    * Ausavarungnirun et al., "Staged memory scheduling: achieving high performance and scalability in heterogeneous systems," ISCA 2012. {{:ausavarungnirun12-sms.pdf|pdf}}+  * Ausavarungnirun et al., "Staged memory scheduling: achieving high performance and scalability in heterogeneous systems," ISCA 2012. {{:ausavarungnirun12-sms.pdf|pdf}}
  
 Optional: Optional:
-    * Kumar et al., “Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction,” MICRO 2003. {{:kumar_singleisaheterog.pdf|pdf}} +  * Kumar et al., “Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction,” MICRO 2003. {{:kumar_singleisaheterog.pdf|pdf}} 
-    * Suleman et al., "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} +  * Suleman et al., "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} 
-    * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multicore Architectures,” IEEE Micro 2010. {{:suleman10-acs.pdf|pdf}} +  * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multicore Architectures,” IEEE Micro 2010. {{:suleman10-acs.pdf|pdf}} 
-    * Suleman et al., "Data marshaling for multi-core architectures," ISCA 2010. {{:suleman10-marshaling.pdf|pdf}} +  * Suleman et al., "Data marshaling for multi-core architectures," ISCA 2010. {{:suleman10-marshaling.pdf|pdf}} 
-    * Suleman et al., "Data Marshaling for Multicore Systems," IEEE Micro 2011. {{:suleman11-marshaling.pdf|pdf}} +  * Suleman et al., "Data Marshaling for Multicore Systems," IEEE Micro 2011. {{:suleman11-marshaling.pdf|pdf}} 
-    * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} +  * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} 
-    * Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," HPCA 2010. {{:kim10-atlas.pdf|pdf}} +  * Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," HPCA 2010. {{:kim10-atlas.pdf|pdf}} 
-    * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}} +  * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}} 
-    * Kim et al., "Thread Cluster Memory Scheduling," IEEE Micro 2011. {{:kim11-tcm.pdf|pdf}} +  * Kim et al., "Thread Cluster Memory Scheduling," IEEE Micro 2011. {{:kim11-tcm.pdf|pdf}} 
-    * Nychis et al., "Next generation on-chip networks: what kind of congestion control do we need?," HotNets 2010. {{:nychis10-congestion.pdf|pdf}} +  * Nychis et al., "Next generation on-chip networks: what kind of congestion control do we need?," HotNets 2010. {{:nychis10-congestion.pdf|pdf}} 
-    * Das et al., "Application-aware prioritization mechanisms for on-chip networks," MICRO 2009. {{:das09-prioritization.pdf|pdf}} +  * Das et al., "Application-aware prioritization mechanisms for on-chip networks," MICRO 2009. {{:das09-prioritization.pdf|pdf}} 
-    * Das et al., "Aérgia: exploiting packet latency slack in on-chip networks," ISCA 2010. {{:das10-aergia.pdf|pdf}} +  * Das et al., "Aérgia: exploiting packet latency slack in on-chip networks," ISCA 2010. {{:das10-aergia.pdf|pdf}} 
-    * Das et al., "Aérgia: A Network-on-Chip Exploiting Packet Latency Slack," IEEE Micro 2011. {{:das11-aergia.pdf|pdf}} +  * Das et al., "Aérgia: A Network-on-Chip Exploiting Packet Latency Slack," IEEE Micro 2011. {{:das11-aergia.pdf|pdf}} 
-    * Meza et al., "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management," IEEE CAL 2012. {{:meza12-timber.pdf|pdf}} +  * Meza et al., "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management," IEEE CAL 2012. {{:meza12-timber.pdf|pdf}} 
-    * Suleman et al., "Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs," ASPLOS 2008. {{:suleman_feedback.pdf|pdf}} +  * Suleman et al., "Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs," ASPLOS 2008. {{:suleman_feedback.pdf|pdf}} 
-    * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} +  * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} 
-    * Morad et al., "Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors," IEEE CAL 2006. {{:morad_jul05.pdf|pdf}} +  * Morad et al., "Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors," IEEE CAL 2006. {{:morad_jul05.pdf|pdf}} 
-    * Suleman et al., "ACMP: Balancing Hardware Efficiency and Programmer Efficiency," HPS Technical Report 2007. {{:TR-HPS-2007-001.pdf|pdf}} +  * Suleman et al., "ACMP: Balancing Hardware Efficiency and Programmer Efficiency," HPS Technical Report 2007. {{:TR-HPS-2007-001.pdf|pdf}} 
-    * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} +  * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} 
-    * Suleman, "An Asymmetric Multi-core Architecture for Efficiently Accelerating Critical Paths in Multithreaded Programs," PhD thesis 2010. {{:TR-HPS-2010-003.pdf|pdf}}+  * Suleman, "An Asymmetric Multi-core Architecture for Efficiently Accelerating Critical Paths in Multithreaded Programs," PhD thesis 2010. {{:TR-HPS-2010-003.pdf|pdf}}
  
 =====Lecture 7===== =====Lecture 7=====
Line 292: Line 293:
 =====Lecture 20===== =====Lecture 20=====
 Optional: Optional:
-  * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985 {{:gurd95.pdf|pdf}}+  * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985{{:gurd95.pdf|pdf}}
   * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}}   * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}}
   * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}}   * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}}
Line 300: Line 301:
   * Martinez and Torrellas, "Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications," ASPLOS 2002. {{:martinez_specsync02.pdf|pdf}}   * Martinez and Torrellas, "Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications," ASPLOS 2002. {{:martinez_specsync02.pdf|pdf}}
   * Rajwar and Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS 2002. {{:rajwar_tlr02.pdf|pdf}}   * Rajwar and Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS 2002. {{:rajwar_tlr02.pdf|pdf}}
 +  * Shavit and Touitou, "Software transactional memory," PODC 1995. {{:shavit95-swtm.pdf|pdf}}
 +  * Dice et al., "Early experience with a commercial hardware transactional memory implementation," ASPLOS 2009. {{:dice09-transactional.pdf|pdf}}
 +  * Wang et al., "Evaluation of blue Gene/Q hardware support for transactional memories," PACT 2012. {{:wang12-transactional.pdf|pdf}}
 +  * Glass and Ni, “The Turn Model for Adaptive Routing,” ISCA 1992. {{:glass_turnmodel92.pdf|pdf}}
 +
 +=====Lecture 21=====
 +Optional:
 +  * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}}
 +  * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}}
 +  * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}}
 +  * Patt et al., "Critical issues regarding HPS, a high performance microarchitecture," MICRO 1985. {{:patt85-hpsissues.pdf|pdf}}
 +  * Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. {{:sankaralingam_itdlp03.pdf|pdf}}
 +  * Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. {{:burger_edge04.pdf|pdf}}
 +  * Das et al., "Application-aware prioritization mechanisms for on-chip networks," MICRO 2009. {{:das09-prioritization.pdf|pdf}}
 +  * Das et al., "Aérgia: exploiting packet latency slack in on-chip networks," ISCA 2010. {{:das10-aergia.pdf|pdf}}
 +  * Grot et al., "Express Cube Topologies for On-Chip Interconnects," HPCA 2009. {{:grot_expresscube09.pdf|pdf}}
 +  * Grot et al., “Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees,” ISCA 2011. {{:grot11-kilonoc.pdf|pdf}}
 +  * Grot et al., “Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,” MICRO 2009. {{:grot09-pvc.pdf|pdf}}
 +
 +=====Lecture 22=====
 +Optional:
 +  * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}}
 +  * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}}
 +  * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}}
 +  * Patt et al., "Critical issues regarding HPS, a high performance microarchitecture," MICRO 1985. {{:patt85-hpsissues.pdf|pdf}}
 +  * Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. {{:sankaralingam_itdlp03.pdf|pdf}}
 +  * Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. {{:burger_edge04.pdf|pdf}}
 +  * Dennis and Misunas, "A preliminary architecture for a basic data flow processor," ISCA 1974. {{:dennis74.pdf|pdf}}
 +  * Treleaven et al., “Data-Driven and Demand-Driven Computer Architecture,” ACM Computing Surveys 1982. {{:treleaven82.pdf|pdf}}
 +  * Veen, “Dataflow Machine Architecture,” ACM Computing Surveys 1986. {{:veen86.pdf|pdf}}
 +  * Arvind and Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE TC 1990. {{:arvind90.pdf|pdf}}
 +  * Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986. {{:hwu86-hpsm.pdf|pdf}}
 +
 +=====Lecture 23=====
 +Optional:
 +  * Sakai et al., “An Architecture of a Dataflow Single Chip Processor,” ISCA 1989. {{:sakai_dataflow89.pdf|pdf}}
 +  * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}}
 +  * Colwell, "The Pentium Chronicles," Wiley-IEEE Computer Society Press 2005.
 +  * Kung, “Why Systolic Architectures?,” IEEE Computer 1982. {{:kung_systolic82.pdf|pdf}}
 +  * Annaratone et al., “Warp Architecture and Implementation,” ISCA 1986. {{:annaratone_warparch86.pdf|pdf}}
 +  * Annaratone et al., “The Warp Computer: Architecture, Implementation, and Performance,” IEEE TC 1987. {{:annaratone_warpperf87.pdf|pdf}}
 +
 +=====Lecture 24=====
 +Required:
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}
 +  * Qureshi and Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches,” MICRO 2006. {{:qureshi06-ucp.pdf|pdf}}
 +  * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}}
 +  * Qureshi, “Adaptive Spill-Receive for Robust High-Performance Caching in CMPs,” HPCA 2009. {{:qureshi09-asr.pdf|pdf}}
 +  * Hardavellas et al., “Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches,” ISCA 2009. {{:hardavellas09_rnuca.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}}
 +  * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}}
 +  * Kim et al., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” ASPLOS 2002. {{:kim02_nuca.pdf|pdf}}
 +  * Qureshi et al., “Adaptive Insertion Policies for High-Performance Caching,” ISCA 2007. {{:qureshi07_adaptive.pdf|pdf}}
 +  * Lin et al., “Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems,” HPCA 2008. {{:lin08-partitioning.pdf|pdf}}
 +
 +Optional:
 +  * Suh et al., “A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning,” HPCA 2002. {{:suh02-partitioning.pdf|pdf}}
 +  * Grot et al., “Preemptive virtual clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,“ MICRO 2009. {{:grot09-pvc.pdf|pdf}}
 +
 +=====Lecture 25=====
 +Required:
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}}
 +  * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}}
 +
 +Optional:
 +  * Moscibroda and Mutlu, "Distributed order scheduling and its application to multi-core DRAM controllers," PODC 2008. {{:moscibroda08-order.pdf|pdf}}
 +  * Waldspurger and Weihl, "Lottery scheduling: flexible proportional-share resource management," OSDI 1994. {{:waldspurger94-lottery.pdf|pdf}}
 +
 +=====Lecture 26=====
 +Required:
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010. {{:ebrahimi_throttle10.pdf|pdf}}
 +  * Subramanian et al., "MISE: Providing Performance Predictability in Shared Main Memory Systems," HPCA 2013.
 +
 +Recommended:
 +  * Kim et al., “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010. {{:kim10-tcm.pdf|pdf}}
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008. {{:mutlu08-parbs.pdf|pdf}}
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +
 +=====Lecture 27=====
 +Required:
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Ebrahimi et al, "Coordinated Control of Multiple Prefetchers in Multi-Core Systems," HPCA 2009. {{:ebrahimi09-prefetchers.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}}
 +  * Srinath et al, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," HPCA 2007. {{:srinath07-fdp.pdf|pdf}}
 +  * Zhuang and Lee, "A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches," ICPP 2003. {{:zhuang03-prefetch.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}