Differences

This shows you the differences between two versions of the page.

Link to this comparison view

readings [2012/11/02 21:15]
hanbiny
readings [2014/09/02 03:31] (current)
Line 1: Line 1:
 =====Readings===== =====Readings=====
 +
  
 =====Lecture 1===== =====Lecture 1=====
Line 341: Line 342:
   * Annaratone et al., “Warp Architecture and Implementation,” ISCA 1986. {{:annaratone_warparch86.pdf|pdf}}   * Annaratone et al., “Warp Architecture and Implementation,” ISCA 1986. {{:annaratone_warparch86.pdf|pdf}}
   * Annaratone et al., “The Warp Computer: Architecture, Implementation, and Performance,” IEEE TC 1987. {{:annaratone_warpperf87.pdf|pdf}}   * Annaratone et al., “The Warp Computer: Architecture, Implementation, and Performance,” IEEE TC 1987. {{:annaratone_warpperf87.pdf|pdf}}
 +
 +=====Lecture 24=====
 +Required:
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}
 +  * Qureshi and Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches,” MICRO 2006. {{:qureshi06-ucp.pdf|pdf}}
 +  * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}}
 +  * Qureshi, “Adaptive Spill-Receive for Robust High-Performance Caching in CMPs,” HPCA 2009. {{:qureshi09-asr.pdf|pdf}}
 +  * Hardavellas et al., “Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches,” ISCA 2009. {{:hardavellas09_rnuca.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}}
 +  * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}}
 +  * Kim et al., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” ASPLOS 2002. {{:kim02_nuca.pdf|pdf}}
 +  * Qureshi et al., “Adaptive Insertion Policies for High-Performance Caching,” ISCA 2007. {{:qureshi07_adaptive.pdf|pdf}}
 +  * Lin et al., “Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems,” HPCA 2008. {{:lin08-partitioning.pdf|pdf}}
 +
 +Optional:
 +  * Suh et al., “A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning,” HPCA 2002. {{:suh02-partitioning.pdf|pdf}}
 +  * Grot et al., “Preemptive virtual clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,“ MICRO 2009. {{:grot09-pvc.pdf|pdf}}
 +
 +=====Lecture 25=====
 +Required:
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}}
 +  * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}}
 +
 +Optional:
 +  * Moscibroda and Mutlu, "Distributed order scheduling and its application to multi-core DRAM controllers," PODC 2008. {{:moscibroda08-order.pdf|pdf}}
 +  * Waldspurger and Weihl, "Lottery scheduling: flexible proportional-share resource management," OSDI 1994. {{:waldspurger94-lottery.pdf|pdf}}
 +
 +=====Lecture 26=====
 +Required:
 +  * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}}
 +  * Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010. {{:ebrahimi_throttle10.pdf|pdf}}
 +  * Subramanian et al., "MISE: Providing Performance Predictability in Shared Main Memory Systems," HPCA 2013.
 +
 +Recommended:
 +  * Kim et al., “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010. {{:kim10-tcm.pdf|pdf}}
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008. {{:mutlu08-parbs.pdf|pdf}}
 +  * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007.  {{:mutlu07.pdf|pdf}}
 +
 +=====Lecture 27=====
 +Required:
 +  * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}}
 +  * Ebrahimi et al, "Coordinated Control of Multiple Prefetchers in Multi-Core Systems," HPCA 2009. {{:ebrahimi09-prefetchers.pdf|pdf}}
 +
 +Recommended:
 +  * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}}
 +  * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}}
 +  * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}}
 +  * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}}
 +  * Srinath et al, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," HPCA 2007. {{:srinath07-fdp.pdf|pdf}}
 +  * Zhuang and Lee, "A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches," ICPP 2003. {{:zhuang03-prefetch.pdf|pdf}}
 +  * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}}