Differences

This shows you the differences between two versions of the page.

Link to this comparison view

readings [2012/10/29 21:52]
hanbiny
readings [2014/09/02 03:31] (current)
Line 1: Line 1:
 =====Readings===== =====Readings=====
 +
  
 =====Lecture 1===== =====Lecture 1=====
Line 317: Line 318:
   * Grot et al., "Express Cube Topologies for On-Chip Interconnects," HPCA 2009. {{:grot_expresscube09.pdf|pdf}}   * Grot et al., "Express Cube Topologies for On-Chip Interconnects," HPCA 2009. {{:grot_expresscube09.pdf|pdf}}
   * Grot et al., “Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees,” ISCA 2011. {{:grot11-kilonoc.pdf|pdf}}   * Grot et al., “Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees,” ISCA 2011. {{:grot11-kilonoc.pdf|pdf}}
 +  * Grot et al., “Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,” MICRO 2009. {{:grot09-pvc.pdf|pdf}}
 +
 +=====Lecture 22=====
 +Optional:
 +  * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}}
 +  * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}}
 +  * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}}
 +  * Patt et al., "Critical issues regarding HPS, a high performance microarchitecture," MICRO 1985. {{:patt85-hpsissues.pdf|pdf}}
 +  * Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. {{:sankaralingam_itdlp03.pdf|pdf}}
 +  * Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. {{:burger_edge04.pdf|pdf}}
 +  * Dennis and Misunas, "A preliminary architecture for a basic data flow processor," ISCA 1974. {{:dennis74.pdf|pdf}}
 +  * Treleaven et al., “Data-Driven and Demand-Driven Computer Architecture,” ACM Computing Surveys 1982. {{:treleaven82.pdf|pdf}}
 +  * Veen, “Dataflow Machine Architecture,” ACM Computing Surveys 1986. {{:veen86.pdf|pdf}}
 +  * Arvind and Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE TC 1990. {{:arvind90.pdf|pdf}}
 +  * Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986. {{:hwu86-hpsm.pdf|pdf}}
 +
 +=====Lecture 23=====
 +Optional:
 +  * Sakai et al., “An Architecture of a Dataflow Single Chip Processor,” ISCA 1989. {{:sakai_dataflow89.pdf|pdf}}
 +  * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}}
 +  * Colwell, "The Pentium Chronicles," Wiley-IEEE Computer Society Press 2005.
 +  * Kung, “Why Systolic Architectures?,” IEEE Computer 1982. {{:kung_systolic82.pdf|pdf}}
 +  * Annaratone et al., “Warp Architecture and Implementation,” ISCA 1986. {{:annaratone_warparch86.pdf|pdf}}
 +  * Annaratone et al., “The Warp Computer: Architecture, Implementation, and Performance,” IEEE TC 1987. {{:annaratone_warpperf87.pdf|pdf}}
 +
 +=====Lecture 24=====
 +Required:
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}
 +  * Qureshi and Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches,” MICRO 2006. {{:qureshi06-ucp.pdf|pdf}}
 +  * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}}
 +  * Qureshi, “Adaptive Spill-Receive for Robust High-Performance Caching in CMPs,” HPCA 2009. {{:qureshi09-asr.pdf|pdf}}
 +  * Hardavellas et al., “Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches,” ISCA 2009. {{:hardavellas09_rnuca.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}}
 +  * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}}
 +  * Kim et al., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” ASPLOS 2002. {{:kim02_nuca.pdf|pdf}}
 +  * Qureshi et al., “Adaptive Insertion Policies for High-Performance Caching,” ISCA 2007. {{:qureshi07_adaptive.pdf|pdf}}
 +  * Lin et al., “Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems,” HPCA 2008. {{:lin08-partitioning.pdf|pdf}}
 +
 +Optional:
 +  * Suh et al., “A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning,” HPCA 2002. {{:suh02-partitioning.pdf|pdf}}
 +  * Grot et al., “Preemptive virtual clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,“ MICRO 2009. {{:grot09-pvc.pdf|pdf}}
 +
 +=====Lecture 25=====
 +Required:
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}}
 +  * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}}
 +
 +Optional:
 +  * Moscibroda and Mutlu, "Distributed order scheduling and its application to multi-core DRAM controllers," PODC 2008. {{:moscibroda08-order.pdf|pdf}}
 +  * Waldspurger and Weihl, "Lottery scheduling: flexible proportional-share resource management," OSDI 1994. {{:waldspurger94-lottery.pdf|pdf}}
 +
 +=====Lecture 26=====
 +Required:
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010. {{:ebrahimi_throttle10.pdf|pdf}}
 +  * Subramanian et al., "MISE: Providing Performance Predictability in Shared Main Memory Systems," HPCA 2013.
 +
 +Recommended:
 +  * Kim et al., “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010. {{:kim10-tcm.pdf|pdf}}
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008. {{:mutlu08-parbs.pdf|pdf}}
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +
 +=====Lecture 27=====
 +Required:
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Ebrahimi et al, "Coordinated Control of Multiple Prefetchers in Multi-Core Systems," HPCA 2009. {{:ebrahimi09-prefetchers.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}}
 +  * Srinath et al, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," HPCA 2007. {{:srinath07-fdp.pdf|pdf}}
 +  * Zhuang and Lee, "A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches," ICPP 2003. {{:zhuang03-prefetch.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}