Differences
This shows you the differences between two versions of the page.
readings [2012/10/31 20:40] hanbiny |
readings [2014/09/02 03:31] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
=====Readings===== | =====Readings===== | ||
+ | |||
=====Lecture 1===== | =====Lecture 1===== | ||
Line 320: | Line 321: | ||
=====Lecture 22===== | =====Lecture 22===== | ||
+ | Optional: | ||
* Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}} | * Gurd et al., "The Manchester prototype dataflow computer," CACM 1985. {{:gurd95.pdf|pdf}} | ||
* Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}} | * Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994. {{:lee_dataflow94.pdf|pdf}} | ||
Line 331: | Line 333: | ||
* Arvind and Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE TC 1990. {{:arvind90.pdf|pdf}} | * Arvind and Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE TC 1990. {{:arvind90.pdf|pdf}} | ||
* Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986. {{:hwu86-hpsm.pdf|pdf}} | * Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986. {{:hwu86-hpsm.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 23===== | ||
+ | Optional: | ||
* Sakai et al., “An Architecture of a Dataflow Single Chip Processor,” ISCA 1989. {{:sakai_dataflow89.pdf|pdf}} | * Sakai et al., “An Architecture of a Dataflow Single Chip Processor,” ISCA 1989. {{:sakai_dataflow89.pdf|pdf}} | ||
+ | * Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985. {{:patt85.pdf|pdf}} | ||
+ | * Colwell, "The Pentium Chronicles," Wiley-IEEE Computer Society Press 2005. | ||
+ | * Kung, “Why Systolic Architectures?,” IEEE Computer 1982. {{:kung_systolic82.pdf|pdf}} | ||
+ | * Annaratone et al., “Warp Architecture and Implementation,” ISCA 1986. {{:annaratone_warparch86.pdf|pdf}} | ||
+ | * Annaratone et al., “The Warp Computer: Architecture, Implementation, and Performance,” IEEE TC 1987. {{:annaratone_warpperf87.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 24===== | ||
+ | Required: | ||
+ | * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}} | ||
+ | * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}} | ||
+ | * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}} | ||
+ | * Qureshi and Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches,” MICRO 2006. {{:qureshi06-ucp.pdf|pdf}} | ||
+ | * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}} | ||
+ | * Qureshi, “Adaptive Spill-Receive for Robust High-Performance Caching in CMPs,” HPCA 2009. {{:qureshi09-asr.pdf|pdf}} | ||
+ | * Hardavellas et al., “Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches,” ISCA 2009. {{:hardavellas09_rnuca.pdf|pdf}} | ||
+ | |||
+ | Recommended: | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}} | ||
+ | * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}} | ||
+ | * Kim et al., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” ASPLOS 2002. {{:kim02_nuca.pdf|pdf}} | ||
+ | * Qureshi et al., “Adaptive Insertion Policies for High-Performance Caching,” ISCA 2007. {{:qureshi07_adaptive.pdf|pdf}} | ||
+ | * Lin et al., “Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems,” HPCA 2008. {{:lin08-partitioning.pdf|pdf}} | ||
+ | |||
+ | Optional: | ||
+ | * Suh et al., “A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning,” HPCA 2002. {{:suh02-partitioning.pdf|pdf}} | ||
+ | * Grot et al., “Preemptive virtual clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,“ MICRO 2009. {{:grot09-pvc.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 25===== | ||
+ | Required: | ||
+ | * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}} | ||
+ | * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}} | ||
+ | * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}} | ||
+ | |||
+ | Recommended: | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Zheng et al., “Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency,” MICRO 2008. {{:zheng08.pdf|pdf}} | ||
+ | * Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008. {{:ipek08-selfoptimizing.pdf|pdf}} | ||
+ | |||
+ | Optional: | ||
+ | * Moscibroda and Mutlu, "Distributed order scheduling and its application to multi-core DRAM controllers," PODC 2008. {{:moscibroda08-order.pdf|pdf}} | ||
+ | * Waldspurger and Weihl, "Lottery scheduling: flexible proportional-share resource management," OSDI 1994. {{:waldspurger94-lottery.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 26===== | ||
+ | Required: | ||
+ | * Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. {{:mcp_micro2011.pdf|pdf}} | ||
+ | * Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010. {{:ebrahimi_throttle10.pdf|pdf}} | ||
+ | * Subramanian et al., "MISE: Providing Performance Predictability in Shared Main Memory Systems," HPCA 2013. | ||
+ | |||
+ | Recommended: | ||
+ | * Kim et al., “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010. {{:kim10-tcm.pdf|pdf}} | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004. {{:kim04-faircache.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008. {{:mutlu08-parbs.pdf|pdf}} | ||
+ | * Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007. {{:mph_usenix_security07.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | |||
+ | =====Lecture 27===== | ||
+ | Required: | ||
+ | * Ausavarungnirun et al., “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012. {{:sms_isca12.pdf|pdf}} | ||
+ | * Ebrahimi et al, "Coordinated Control of Multiple Prefetchers in Multi-Core Systems," HPCA 2009. {{:ebrahimi09-prefetchers.pdf|pdf}} | ||
+ | |||
+ | Recommended: | ||
+ | * Rixner et al., “Memory Access Scheduling,” ISCA 2000. {{:rixner00.pdf|pdf}} | ||
+ | * Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010. {{:kim10-atlas.pdf|pdf}} | ||
+ | * Kim et al., "Thread Cluster Memory Scheduling," MICRO 2010. {{:kim10-tcm.pdf|pdf}} | ||
+ | * Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. {{:mutlu07.pdf|pdf}} | ||
+ | * Srinath et al, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," HPCA 2007. {{:srinath07-fdp.pdf|pdf}} | ||
+ | * Zhuang and Lee, "A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches," ICPP 2003. {{:zhuang03-prefetch.pdf|pdf}} | ||
+ | * Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008. {{:lee_prefetchdram08.pdf|pdf}} |