Differences
This shows you the differences between two versions of the page.
readings [2012/10/24 21:09] hanbiny |
readings [2014/09/02 03:31] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
=====Readings===== | =====Readings===== | ||
+ | |||
=====Lecture 1===== | =====Lecture 1===== | ||
Required: | Required: | ||
- | * Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture. {{:reading_hill_551_560.pdf|pdf}} | + | * Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture. {{:reading_hill_551_560.pdf|pdf}} |
- | * Hill, Jouppi, Sohi, “Dataflow and Multithreading,” pp. 309-314 in Readings in Computer Architecture. {{:reading_hill_309_314.pdf|pdf}} | + | * Hill, Jouppi, Sohi, “Dataflow and Multithreading,” pp. 309-314 in Readings in Computer Architecture. {{:reading_hill_309_314.pdf|pdf}} |
- | * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} | + | * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} |
- | * Culler & Singh, Chapter 1 | + | * Culler & Singh, Chapter 1 |
- | * Hamming, “You and Your Research,” Bell Communications Research Colloquium Seminar, 7 March 1986. {{http://www.cs.virginia.edu/~robins/YouAndYourResearch.html|here}} | + | * Hamming, “You and Your Research,” Bell Communications Research Colloquium Seminar, 7 March 1986. {{http://www.cs.virginia.edu/~robins/YouAndYourResearch.html|here}} |
Optional: | Optional: | ||
- | * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} | + | * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} |
- | * Kumar et al., “Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors,” ISCA 2007. {{:kumar07-carbon.pdf|pdf}} | + | * Kumar et al., “Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors,” ISCA 2007. {{:kumar07-carbon.pdf|pdf}} |
Supplementary Readings on Research, Writing, Reviews: | Supplementary Readings on Research, Writing, Reviews: | ||
- | * Levin and Redell, “How (and how not) to write a good systems paper,” OSR 1983. {{:systemspaper_levin.pdf|pdf}} | + | * Levin and Redell, “How (and how not) to write a good systems paper,” OSR 1983. {{:systemspaper_levin.pdf|pdf}} |
- | * Smith, “The Task of the Referee,” IEEE Computer 1990. {{:smith90-referee.pdf|pdf}} | + | * Smith, “The Task of the Referee,” IEEE Computer 1990. {{:smith90-referee.pdf|pdf}} |
- | * SP Jones, “How to Write a Great Research Paper” {{:jones04-writing-a-paper-slides.pdf|pdf}} | + | * SP Jones, “How to Write a Great Research Paper”. {{:jones04-writing-a-paper-slides.pdf|pdf}} |
- | * Fong, “How to Write a CS Research Paper: A Bibliography” {{:fong06-writing-papers.pdf|pdf}} | + | * Fong, “How to Write a CS Research Paper: A Bibliography”. {{:fong06-writing-papers.pdf|pdf}} |
=====Lecture 2===== | =====Lecture 2===== | ||
Required: | Required: | ||
- | * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} | + | * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} |
- | * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} | + | * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} |
- | * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} | + | * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} |
- | * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} | + | * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} |
- | * Ipek et al., “Core Fusion: Accommodating Software Diversity in Chip Multiprocessors,” ISCA 2007. {{:ipek07-fusion.pdf|pdf}} | + | * Ipek et al., “Core Fusion: Accommodating Software Diversity in Chip Multiprocessors,” ISCA 2007. {{:ipek07-fusion.pdf|pdf}} |
Optional: | Optional: | ||
- | * Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. {{:flynn66_computing.pdf|pdf}} | + | * Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. {{:flynn66_computing.pdf|pdf}} |
- | * Thornton, “CDC 6600: Design of a Computer,” 1970. {{:thornton_cdc6600.pdf|pdf}} | + | * Thornton, “CDC 6600: Design of a Computer,” 1970. {{:thornton_cdc6600.pdf|pdf}} |
- | * Burton Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978. {{:smith78_hep.pdf|pdf}} | + | * Burton Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978. {{:smith78_hep.pdf|pdf}} |
- | * Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” AFIPS 1967. {{:amdahl67_singleproc.pdf|pdf}} | + | * Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” AFIPS 1967. {{:amdahl67_singleproc.pdf|pdf}} |
- | * Eyerman and Eeckhout, “Modeling critical sections in Amdahl's law and its implications for multicore design,” ISCA 2010. {{:eyerman_critsectamdahl.pdf|pdf}} | + | * Eyerman and Eeckhout, “Modeling critical sections in Amdahl's law and its implications for multicore design,” ISCA 2010. {{:eyerman_critsectamdahl.pdf|pdf}} |
- | * Suleman et al., “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs,” ASPLOS 2008. {{:suleman_feedback.pdf|pdf}} | + | * Suleman et al., “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs,” ASPLOS 2008. {{:suleman_feedback.pdf|pdf}} |
=====Lecture 3===== | =====Lecture 3===== | ||
Required: | Required: | ||
- | * Hillis and Tucker, "The CM-5 Connection Machine: a scalable supercomputer," CACM 1993. {{:hillis_cm5.pdf|pdf}} | + | * Hillis and Tucker, "The CM-5 Connection Machine: a scalable supercomputer," CACM 1993. {{:hillis_cm5.pdf|pdf}} |
- | * Seitz, "The Cosmic Cube," CACM 1985. {{:seitz_cosmiccube.pdf|pdf}} | + | * Seitz, "The Cosmic Cube," CACM 1985. {{:seitz_cosmiccube.pdf|pdf}} |
Optional: | Optional: | ||
- | * Li and Hudak, "Memory Coherence in Shared Virtual Memory Systems, " ACM TOCS 1989. {{:li_coherencesharedmem.pdf|pdf}} | + | * Li and Hudak, "Memory Coherence in Shared Virtual Memory Systems, " ACM TOCS 1989. {{:li_coherencesharedmem.pdf|pdf}} |
- | * Batcher, "Architecture of a massively parallel processor," ISCA 1980. {{:batcher_massparproc.pdf|pdf}} | + | * Batcher, "Architecture of a massively parallel processor," ISCA 1980. {{:batcher_massparproc.pdf|pdf}} |
- | * Tucker and Robertson, "Architecture and Applications of the Connection Machine," IEEE Computer 1988. {{:tucker_connection.pdf|pdf}} | + | * Tucker and Robertson, "Architecture and Applications of the Connection Machine," IEEE Computer 1988. {{:tucker_connection.pdf|pdf}} |
=====Lecture 4===== | =====Lecture 4===== | ||
Optional: | Optional: | ||
- | * Moore, "Cramming more components onto integrated circuits," Electronics, 1965. {{:r1_moore.pdf|pdf}} | + | * Moore, "Cramming more components onto integrated circuits," Electronics, 1965. {{:r1_moore.pdf|pdf}} |
- | * Stark, "On pipelining dynamic instruction scheduling logic," MICRO 2000. {{:stark00-scheduling.pdf|pdf}} | + | * Stark, "On pipelining dynamic instruction scheduling logic," MICRO 2000. {{:stark00-scheduling.pdf|pdf}} |
- | * Olukotun et al., "The Case for a Single-Chip Multiprocessor," ASPLOS 1996. {{:olukutun96_cmp.pdf|pdf}} | + | * Olukotun et al., "The Case for a Single-Chip Multiprocessor," ASPLOS 1996. {{:olukutun96_cmp.pdf|pdf}} |
- | * Kessler, "The Alpha 21264 Microprocessor," IEEE Micro 1999. {{:kessler99-alpha21264.pdf|pdf}} | + | * Kessler, "The Alpha 21264 Microprocessor," IEEE Micro 1999. {{:kessler99-alpha21264.pdf|pdf}} |
- | * Palacharla et al., "Complexity-effective superscalar processors," ISCA 1997. {{:palacharla97-complexity.pdf|pdf}} | + | * Palacharla et al., "Complexity-effective superscalar processors," ISCA 1997. {{:palacharla97-complexity.pdf|pdf}} |
=====Lecture 5===== | =====Lecture 5===== | ||
Optional: | Optional: | ||
- | * Smith, "A pipelined, shared resource MIMD computer," ICPP 1978. {{:smith78_hep.pdf|pdf}} | + | * Smith, "A pipelined, shared resource MIMD computer," ICPP 1978. {{:smith78_hep.pdf|pdf}} |
- | * Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," ISCA 2000. {{:barroso00_piranha.pdf|pdf}} | + | * Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," ISCA 2000. {{:barroso00_piranha.pdf|pdf}} |
- | * Barroso et al., "Memory system characterization of commercial workloads," ISCA 1998. {{:barroso98-workloads.pdf|pdf}} | + | * Barroso et al., "Memory system characterization of commercial workloads," ISCA 1998. {{:barroso98-workloads.pdf|pdf}} |
- | * Ranganathan et al., "Performance of database workloads on shared-memory systems with out-of-order processors," ASPLOS 1998. {{:ranganathan98-workloads.pdf|pdf}} | + | * Ranganathan et al., "Performance of database workloads on shared-memory systems with out-of-order processors," ASPLOS 1998. {{:ranganathan98-workloads.pdf|pdf}} |
- | * Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro 2005. {{:kongetira05_niagara.pdf|pdf}} | + | * Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro 2005. {{:kongetira05_niagara.pdf|pdf}} |
- | * Spracklen and Abraham, “Chip Multithreading: Opportunities and Challenges,” HPCA Industrial Session, 2005. {{:spracklen05_mt.pdf|pdf}} | + | * Spracklen and Abraham, “Chip Multithreading: Opportunities and Challenges,” HPCA Industrial Session, 2005. {{:spracklen05_mt.pdf|pdf}} |
- | * Chaudhry et al., “Rock: A High-Performance Sparc CMT Processor,” IEEE Micro, 2009. {{:chaudhry_rock.pdf|pdf}} | + | * Chaudhry et al., “Rock: A High-Performance Sparc CMT Processor,” IEEE Micro, 2009. {{:chaudhry_rock.pdf|pdf}} |
- | * Chaudhry et al., “Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor,” ISCA 2009. {{:chaudhry_specthread.pdf|pdf}} | + | * Chaudhry et al., “Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor,” ISCA 2009. {{:chaudhry_specthread.pdf|pdf}} |
- | * Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003. {{:mutlu_runahead.pdf|pdf}} | + | * Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003. {{:mutlu_runahead.pdf|pdf}} |
- | * Mutlu et al., “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” IEEE Micro Jan/Feb 2006. {{:mutlu06_efficient.pdf|pdf}} | + | * Mutlu et al., “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” IEEE Micro Jan/Feb 2006. {{:mutlu06_efficient.pdf|pdf}} |
- | * Tendler et al., "POWER4 system microarchitecture," IBM J R&D, 2002. {{:tendler_power4.pdf|pdf}} | + | * Tendler et al., "POWER4 system microarchitecture," IBM J R&D, 2002. {{:tendler_power4.pdf|pdf}} |
- | * Kalla et al., "IBM Power5 Chip: A Dual-Core Multithreaded Processor," IEEE Micro 2004. {{:kalla04_power5.pdf|pdf}} | + | * Kalla et al., "IBM Power5 Chip: A Dual-Core Multithreaded Processor," IEEE Micro 2004. {{:kalla04_power5.pdf|pdf}} |
- | * Le et al., "IBM POWER6 Microarchitecture," IBM J R&D, 2007. {{:le_power6.pdf|pdf}} | + | * Le et al., "IBM POWER6 Microarchitecture," IBM J R&D, 2007. {{:le_power6.pdf|pdf}} |
- | * Kalla et al., "Power7: IBM’s Next-Generation Server Processor," IEEE Micro 2010. {{:kalla_power7.pdf|pdf}} | + | * Kalla et al., "Power7: IBM’s Next-Generation Server Processor," IEEE Micro 2010. {{:kalla_power7.pdf|pdf}} |
- | * Grochowski et al., "Best of both Latency and Throughput," ICCD 2004. {{:grochowski_latthrough.pdf|pdf}} | + | * Grochowski et al., "Best of both Latency and Throughput," ICCD 2004. {{:grochowski_latthrough.pdf|pdf}} |
- | * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} | + | * Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008. {{:hill08_amdahl.pdf|pdf}} |
- | * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} | + | * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} |
=====Lecture 6===== | =====Lecture 6===== | ||
Recommended: | Recommended: | ||
- | * Ipek et al., "Core Fusion: Accomodating Software Diversity in Chip Multiprocessors," ISCA 2007. {{:ipek07-fusion.pdf|pdf}} | + | * Ipek et al., "Core Fusion: Accomodating Software Diversity in Chip Multiprocessors," ISCA 2007. {{:ipek07-fusion.pdf|pdf}} |
- | * Ausavarungnirun et al., "Staged memory scheduling: achieving high performance and scalability in heterogeneous systems," ISCA 2012. {{:ausavarungnirun12-sms.pdf|pdf}} | + | * Ausavarungnirun et al., "Staged memory scheduling: achieving high performance and scalability in heterogeneous systems," ISCA 2012. {{:ausavarungnirun12-sms.pdf|pdf}} |
Optional: | Optional: | ||
- | * Kumar et al., “Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction,” MICRO 2003. {{:kumar_singleisaheterog.pdf|pdf}} | + | * Kumar et al., “Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction,” MICRO 2003. {{:kumar_singleisaheterog.pdf|pdf}} |
- | * Suleman et al., "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} | + | * Suleman et al., "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," ASPLOS 2009. {{:suleman09-acs.pdf|pdf}} |
- | * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multicore Architectures,” IEEE Micro 2010. {{:suleman10-acs.pdf|pdf}} | + | * Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multicore Architectures,” IEEE Micro 2010. {{:suleman10-acs.pdf|pdf}} |
- | * Suleman et al., "Data marshaling for multi-core architectures," ISCA 2010. {{:suleman10-marshaling.pdf|pdf}} | + | * Suleman et al., "Data marshaling for multi-core architectures," ISCA 2010. {{:suleman10-marshaling.pdf|pdf}} |
- | * Suleman et al., "Data Marshaling for Multicore Systems," IEEE Micro 2011. {{:suleman11-marshaling.pdf|pdf}} | + | * Suleman et al., "Data Marshaling for Multicore Systems," IEEE Micro 2011. {{:suleman11-marshaling.pdf|pdf}} |
- | * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} | + | * Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012. {{:joao12-bottleneck.pdf|pdf}} |
- | * Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," HPCA 2010. {{:kim10-atlas.pdf|pdf}} | + | * Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," HPCA 2010. {{:kim10-atlas.pdf|pdf}} |
- | * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}} | + | * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}} |
- | * Kim et al., "Thread Cluster Memory Scheduling," IEEE Micro 2011. {{:kim11-tcm.pdf|pdf}} | + | * Kim et al., "Thread Cluster Memory Scheduling," IEEE Micro 2011. {{:kim11-tcm.pdf|pdf}} |
- | * Nychis et al., "Next generation on-chip networks: what kind of congestion control do we need?," HotNets 2010. {{:nychis10-congestion.pdf|pdf}} | + | * Nychis et al., "Next generation on-chip networks: what kind of congestion control do we need?," HotNets 2010. {{:nychis10-congestion.pdf|pdf}} |
- | * Das et al., "Application-aware prioritization mechanisms for on-chip networks," MICRO 2009. {{:das09-prioritization.pdf|pdf}} | + | * Das et al., "Application-aware prioritization mechanisms for on-chip networks," MICRO 2009. {{:das09-prioritization.pdf|pdf}} |
- | * Das et al., "Aérgia: exploiting packet latency slack in on-chip networks," ISCA 2010. {{:das10-aergia.pdf|pdf}} | + | * Das et al., "Aérgia: exploiting packet latency slack in on-chip networks," ISCA 2010. {{:das10-aergia.pdf|pdf}} |
- | * Das et al., "Aérgia: A Network-on-Chip Exploiting Packet Latency Slack," IEEE Micro 2011. {{:das11-aergia.pdf|pdf}} | + | * Das et al., "Aérgia: A Network-on-Chip Exploiting Packet Latency Slack," IEEE Micro 2011. {{:das11-aergia.pdf|pdf}} |
- | * Meza et al., "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management," IEEE CAL 2012. {{:meza12-timber.pdf|pdf}} | + | * Meza et al., "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management," IEEE CAL 2012. {{:meza12-timber.pdf|pdf}} |
- | * Suleman et al., "Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs," ASPLOS 2008. {{:suleman_feedback.pdf|pdf}} | + | * Suleman et al., "Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs," ASPLOS 2008. {{:suleman_feedback.pdf|pdf}} |
- | * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} | + | * Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005. {{:annavaram05_amdahl.pdf|pdf}} |
- | * Morad et al., "Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors," IEEE CAL 2006. {{:morad_jul05.pdf|pdf}} | + | * Morad et al., "Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors," IEEE CAL 2006. {{:morad_jul05.pdf|pdf}} |
- | * Suleman et al., "ACMP: Balancing Hardware Efficiency and Programmer Efficiency," HPS Technical Report 2007. {{:TR-HPS-2007-001.pdf|pdf}} | + | * Suleman et al., "ACMP: Balancing Hardware Efficiency and Programmer Efficiency," HPS Technical Report 2007. {{:TR-HPS-2007-001.pdf|pdf}} |
- | * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} | + | * Suleman et al., “Feedback-directed pipeline parallelism,” PACT 2010. {{:suleman_feedpipe10.pdf|pdf}} |
- | * Suleman, "An Asymmetric Multi-core Architecture for Efficiently Accelerating Critical Paths in Multithreaded Programs," PhD thesis 2010. {{:TR-HPS-2010-003.pdf|pdf}} | + | * Suleman, "An Asymmetric Multi-core Architecture for Efficiently Accelerating Critical Paths in Multithreaded Programs," PhD thesis 2010. {{:TR-HPS-2010-003.pdf|pdf}} |
=====Lecture 7===== | =====Lecture 7===== | ||
Line 292: | Line 293: | ||
=====Lecture 20===== | =====Lecture 20===== | ||
Optional: | Optional: | ||
- | * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985 {{:gurd95.pdf|pdf}} | + | * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}} |
* Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}} | * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}} | ||
* Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}} | * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}} | ||
Line 300: | Line 301: | ||
* Martinez and Torrellas, "Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications," ASPLOS 2002. {{:martinez_specsync02.pdf|pdf}} | * Martinez and Torrellas, "Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications," ASPLOS 2002. {{:martinez_specsync02.pdf|pdf}} | ||
* Rajwar and Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS 2002. {{:rajwar_tlr02.pdf|pdf}} | * Rajwar and Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS 2002. {{:rajwar_tlr02.pdf|pdf}} | ||
+ | * Shavit and Touitou, "Software transactional memory," PODC 1995. {{:shavit95-swtm.pdf|pdf}} | ||
+ | * Dice et al., "Early experience with a commercial hardware transactional memory implementation," ASPLOS 2009. {{:dice09-transactional.pdf|pdf}} | ||
+ | * Wang et al., "Evaluation of blue Gene/Q hardware support for transactional memories," PACT 2012. {{:wang12-transactional.pdf|pdf}} | ||
+ | * Glass and Ni, “The Turn Model for Adaptive Routing,” ISCA 1992. {{:glass_turnmodel92.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 21===== | ||
+ | Optional: | ||
+ | * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}} | ||
+ | * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}} | ||
+ | * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}} | ||
+ | * Patt et al., "Critical issues regarding HPS, a high performance microarchitecture," MICRO 1985. {{:patt85-hpsissues.pdf|pdf}} | ||
+ | * Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. {{:sankaralingam_itdlp03.pdf|pdf}} | ||
+ | * Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. {{:burger_edge04.pdf|pdf}} | ||
+ | * Das et al., "Application-aware prioritization mechanisms for on-chip networks," MICRO 2009. {{:das09-prioritization.pdf|pdf}} | ||
+ | * Das et al., "Aérgia: exploiting packet latency slack in on-chip networks," ISCA 2010. {{:das10-aergia.pdf|pdf}} | ||
+ | * Grot et al., "Express Cube Topologies for On-Chip Interconnects," HPCA 2009. {{:grot_expresscube09.pdf|pdf}} | ||
+ | * Grot et al., “Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees,” ISCA 2011. {{:grot11-kilonoc.pdf|pdf}} | ||
+ | * Grot et al., “Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,” MICRO 2009. {{:grot09-pvc.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 22===== | ||
+ | Optional: | ||
+ | * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}} | ||
+ | * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}} | ||
+ | * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}} | ||
+ | * Patt et al., "Critical issues regarding HPS, a high performance microarchitecture," MICRO 1985. {{:patt85-hpsissues.pdf|pdf}} | ||
+ | * Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. {{:sankaralingam_itdlp03.pdf|pdf}} | ||
+ | * Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. {{:burger_edge04.pdf|pdf}} | ||
+ | * Dennis and Misunas, "A preliminary architecture for a basic data flow processor," ISCA 1974. {{:dennis74.pdf|pdf}} | ||
+ | * Treleaven et al., “Data-Driven and Demand-Driven Computer Architecture,” ACM Computing Surveys 1982. {{:treleaven82.pdf|pdf}} | ||
+ | * Veen, “Dataflow Machine Architecture,” ACM Computing Surveys 1986. {{:veen86.pdf|pdf}} | ||
+ | * Arvind and Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE TC 1990. {{:arvind90.pdf|pdf}} | ||
+ | * Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986. {{:hwu86-hpsm.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 23===== | ||
+ | Optional: | ||
+ | * Sakai et al., “An Architecture of a Dataflow Single Chip Processor,” ISCA 1989. {{:sakai_dataflow89.pdf|pdf}} | ||
+ | * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}} | ||
+ | * Colwell, "The Pentium Chronicles," Wiley-IEEE Computer Society Press 2005. | ||
+ | * Kung, “Why Systolic Architectures?,” IEEE Computer 1982. {{:kung_systolic82.pdf|pdf}} | ||
+ | * Annaratone et al., “Warp Architecture and Implementation,” ISCA 1986. {{:annaratone_warparch86.pdf|pdf}} | ||
+ | * Annaratone et al., “The Warp Computer: Architecture, Implementation, and Performance,” IEEE TC 1987. {{:annaratone_warpperf87.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 24===== | ||
+ | Required: | ||
+ | * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}} | ||
+ | * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}} | ||
+ | * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}} | ||
+ | * Qureshi and Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches,” MICRO 2006. {{:qureshi06-ucp.pdf|pdf}} | ||
+ | * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}} | ||
+ | * Qureshi, “Adaptive Spill-Receive for Robust High-Performance Caching in CMPs,” HPCA 2009. {{:qureshi09-asr.pdf|pdf}} | ||
+ | * Hardavellas et al., “Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches,” ISCA 2009. {{:hardavellas09_rnuca.pdf|pdf}} | ||
+ | |||
+ | Recommended: | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}} | ||
+ | * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}} | ||
+ | * Kim et al., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” ASPLOS 2002. {{:kim02_nuca.pdf|pdf}} | ||
+ | * Qureshi et al., “Adaptive Insertion Policies for High-Performance Caching,” ISCA 2007. {{:qureshi07_adaptive.pdf|pdf}} | ||
+ | * Lin et al., “Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems,” HPCA 2008. {{:lin08-partitioning.pdf|pdf}} | ||
+ | |||
+ | Optional: | ||
+ | * Suh et al., “A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning,” HPCA 2002. {{:suh02-partitioning.pdf|pdf}} | ||
+ | * Grot et al., “Preemptive virtual clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,“ MICRO 2009. {{:grot09-pvc.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 25===== | ||
+ | Required: | ||
+ | * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}} | ||
+ | * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}} | ||
+ | * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}} | ||
+ | |||
+ | Recommended: | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}} | ||
+ | * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}} | ||
+ | |||
+ | Optional: | ||
+ | * Moscibroda and Mutlu, "Distributed order scheduling and its application to multi-core DRAM controllers," PODC 2008. {{:moscibroda08-order.pdf|pdf}} | ||
+ | * Waldspurger and Weihl, "Lottery scheduling: flexible proportional-share resource management," OSDI 1994. {{:waldspurger94-lottery.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 26===== | ||
+ | Required: | ||
+ | * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}} | ||
+ | * Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010. {{:ebrahimi_throttle10.pdf|pdf}} | ||
+ | * Subramanian et al., "MISE: Providing Performance Predictability in Shared Main Memory Systems," HPCA 2013. | ||
+ | |||
+ | Recommended: | ||
+ | * Kim et al., “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010. {{:kim10-tcm.pdf|pdf}} | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008. {{:mutlu08-parbs.pdf|pdf}} | ||
+ | * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 27===== | ||
+ | Required: | ||
+ | * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}} | ||
+ | * Ebrahimi et al, "Coordinated Control of Multiple Prefetchers in Multi-Core Systems," HPCA 2009. {{:ebrahimi09-prefetchers.pdf|pdf}} | ||
+ | |||
+ | Recommended: | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | * Srinath et al, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," HPCA 2007. {{:srinath07-fdp.pdf|pdf}} | ||
+ | * Zhuang and Lee, "A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches," ICPP 2003. {{:zhuang03-prefetch.pdf|pdf}} | ||
+ | * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}} |