This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
readings [2015/03/27 20:10] kevincha |
readings [2015/04/10 18:55] clement |
||
---|---|---|---|
Line 261: | Line 261: | ||
* {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}} | * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}} | ||
* {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} | * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} | ||
- | * {{raidr_isca12.pdf| Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (Sections 1 and 2)}} | + | * {{http://users.ece.cmu.edu/~omutlu/pub/raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} |
* {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} | * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} | ||
**Mentioned During Lecture:** | **Mentioned During Lecture:** | ||
Line 270: | Line 270: | ||
* {{ATLAS.pdf| Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter, "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers"}} | * {{ATLAS.pdf| Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter, "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers"}} | ||
- | ===== Lecture 22 (3/25 Wed.) ===== | + | ===== Lecture 23 (3/27 Fri.) ===== |
**Required:** | **Required:** | ||
* {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}} | * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}} | ||
* {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} | * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} | ||
- | * {{raidr_isca12.pdf| Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (Sections 1 and 2)}} | + | * {{http://users.ece.cmu.edu/~omutlu/pub/raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} |
* {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} | * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} | ||
+ | |||
**Mentioned During Lecture:** | **Mentioned During Lecture:** | ||
* {{moscibroda.pdf| Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,”USENIX Security 2007}} | * {{moscibroda.pdf| Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,”USENIX Security 2007}} | ||
- | * {{moscibroda2007.pdf| Onur Mutluand Thomas Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors" ISCA 2008}} | + | * {{http://users.ece.cmu.edu/~omutlu/pub/memory-channel-partitioning-micro11.pdf|Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda, "Reducing Memory Interference in Multicore Systems via Application-Awareness"}} |
- | * {{isca08.pdf| Onur Mutlu and Thomas Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM," ISCA 2008}} | + | * {{http://users.ece.cmu.edu/~omutlu/pub/architecture-aware-distributed-resource-management_vee15.pdf | Wang et al., “A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters,” VEE 2015.}} |
- | * {{thread_cluster_mem_sched.pdf| Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter, "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior" 43rd International Symposium on Microarchitecture (MICRO), pages 65-76, Atlanta, GA, December 2010}} | + | * {{p362-ebrahimi.pdf|Ebrahimi, E., Miftakhutdinov, R., Fallin, C., Lee, C. J., Joao, J. A., Mutlu, O., & Patt, Y. N. (2011). Parallel application memory scheduling. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.}} |
- | * {{ATLAS.pdf| Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter, "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers"}} | + | * {{p335-ebrahimi.pdf|Ebrahimi, E., Lee, C. J., Mutlu, O., & Patt, Y. N. (2010). Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems.}} |
+ | * {{bloom1970.pdf|Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970}} | ||
+ | * {{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}} | ||
+ | * {{http://users.ece.cmu.edu/~kevincha/papers/chang_hpca2014.pdf|Kevin Chang, Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu, "Improving DRAM Performance by Parallelizing Refreshes with Accesses", In HPCA 2014, Orlando, Feb. 2014.}} | ||
+ | |||
+ | ===== Lecture 24 (3/30 Mon.) ===== | ||
+ | **Required:** | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/mutlu_hpca03.pdf | Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003.}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/TR-HPS-2006-006.pdf|Srinathet al., “Feedback directed prefetching”, HPCA 2007.}} | ||
+ | |||
+ | **Mentioned During Lecture:** | ||
+ | * {{ramulator.pdf|Kim et al., “Ramulator: A Fast and Extensible DRAM Simulator,” IEEE Computer Architecture Letters 2015.}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/eaf-cache_pact12.pdf|Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,”PACT 2012.}} | ||
+ | * {{http://hps.ece.utexas.edu/pub/TR-HPS-2010-002.pdf|Lee et al., “DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems,”HPS Technical Report, April 2010.}} | ||
+ | * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}} | ||
+ | * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/rlmc_isca08.pdf|Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,”ISCA 2008}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/mutlu_ieee_micro06.pdf|Mutlu et al., "Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance," ISCA 2005, IEEE Micro Top Picks 2006.}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/mutlu_micro05.pdf|Mutlu et al., "Address-Value Delta (AVD) Prediction," MICRO 2005.}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/armstrong_micro04.pdf|Armstrong et al., "Wrong Path Events," MICRO 2004.}} | ||
+ | |||
+ | ===== Lecture 25 (4/1 Wed.) ===== | ||
+ | **Required:** | ||
+ | * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} | ||
+ | |||
+ | **Mentioned During Lecture:** | ||
+ | * {{jouppi1990.pdf|Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990.}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/TR-HPS-2006-006.pdf|Srinathet al., “Feedback directed prefetching”, HPCA 2007.}} | ||
+ | |||
+ | ===== Lecture 26 (4/3 Fri.) ===== | ||
+ | ** Required: ** | ||
+ | * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} | ||
+ | |||
+ | ** Mentioned during lecture: ** | ||
+ | * {{raidr-isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} | ||
+ | * {{2012_isca_salp.pdf|Kim et al., “A Case for Exploiting Subarray-Level Parallelism in DRAM,” ISCA 2012.}} | ||
+ | * {{TLDRAM-Lee.pdf|Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013.}} | ||
+ | * {{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}} | ||
+ | * {{rowclone_micro13.pdf|Seshadri et al., “RowClone: Fast and Efficient In-DRAM Copy and Initialization of Bulk Data,” MICRO 2013.}} | ||
+ | * {{LCP.pdf|Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework,” MICRO 2013.}} | ||
+ | * {{|Chang et al., “Improving DRAM Performance by Parallelizing Refreshes with Accesses,” HPCA 2014.}} | ||
+ | * {{error-mitigation-for-intermittent-dram-failures_sigmetrics14.pdf|Khan et al., “The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study,” SIGMETRICS 2014.}} | ||
+ | * {{luo_dsn14.pdf|Luo et al., “Characterizing Application Memory Error Vulnerability to Optimize Data Center Cost,” DSN 2014.}} | ||
+ | * {{yoongu2014.pdf|Kim et al., “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014.}} | ||
+ | * {{meza_cal12.pdf|Meza et al., “Enabling Efficient and Scalable Hybrid Memories,” IEEE Comp. Arch. Letters 2012.}} | ||
+ | * {{rowbuffer-aware-caching_iccd12.pdf|Yoon et al., “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012.}} | ||
+ | * {{sttram_ispass13.pdf|Kultursay et al., “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013. }} | ||
+ | * {{meza_weed13.pdf|Meza et al., “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.}} | ||
+ | * {{ISCA09.pdf|Lee et al. “Architecting Phase Change Memory as a Scalable DRAM Alternative,” ISCA 2009.}} | ||
+ | * Meza+, “Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field,” DSN 2015. | ||
+ | * Qureshi+, “AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems,” DSN 2015. | ||
+ | * {{ramulator.pdf|Kim et al., “Ramulator: A Fast and Extensible DRAM Simulator,” IEEE Computer Architecture Letters 2015.}} | ||
+ | * {{dirty-block-index_isca14.pdf|Seshadri+, “The Dirty-Block Index,” ISCA 2014.}} | ||
+ | * {{bdi-compression_pact12.pdf|Pekhimenko+, “Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches,” PACT 2012.}} | ||
+ | * {{eaf-cache_pact12.pdf|Seshadri+, “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,” PACT 2012.}} | ||
+ | * {{zhao-micro2014.pdf|Zhao+, “FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems,” MICRO 2014.}} | ||
+ | * {{a40-yoon.pdf|Yoon, Meza+, “Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories,” ACM TACO 2014.}} | ||
+ | * {{compression-aware-cache-management_hpca15.pdf|Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry, Exploiting Compressed Block Size as an Indicator of Future Reuse, HPCA 2015.}} | ||
+ | * {{06974684.pdf|Lu+, “Loose Ordering Consistency for Persistent Memory,” ICCD 2014.}} | ||
+ | * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu, O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance Computer Architecture.}} | ||
+ | * {{cooksey.pdf|Cooksey et al., “A stateless, content-directed data prefetching mechanism,” ASPLOS 2002.}} | ||
+ | * {{zilles-2001.pdf|Zilles and Sohi, “Execution-based Prediction Using Speculative Slices”, ISCA 2001.}} | ||
+ | * {{zilles-2000.pdf|Zilles and Sohi, ”Understanding the backward slices of performance degrading instructions,” ISCA 2000.}} | ||
+ | * {{http://www.amazon.com/Inside-AS-400-Second-Edition/dp/1882419669|Frank Soltis,"Inside the AS/400"}} | ||
+ | |||
+ | ===== Lecture 27 (4/6 Mon.) ===== | ||
+ | ** Required: ** | ||
+ | * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}} | ||
+ | * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport, L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}} | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/culler-mesi.pdf|C&S, Chapters 5.1 & 5.3]] | ||
+ | * P&H, Chapter 5.8 | ||
+ | ** Recommended: ** | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_309_314.pdf|Hill, Jouppi, Sohi. "Multiprocessors and Multicomputers," pp. 551-560 in Readings in Computer Architecture.]] | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_551_560.pdf|Hill, Jouppi, Sohi. "Dataflow and Multithreading," pp. 309-314 in Readings in Computer Architecture.]] | ||
+ | * {{01447203.pdf|Flynn, M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}} | ||
+ | * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos, M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} | ||
+ | ** Mentioned during lecture: ** | ||
+ | * {{horner-1819.pdf|Horner (1819). A new method of solving numerical equations of all orders, by continuous approximation. Philosophical Transactions of the Royal Society}} | ||
+ | |||
+ | ===== Lecture 28 (4/8 Wed.) ===== | ||
+ | ** Required: ** | ||
+ | * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport, L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}} | ||
+ | * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos, M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/culler-mesi.pdf|C&S, Chapters 5.1 & 5.3]] | ||
+ | * P&H, Chapter 5.8 | ||
+ | ** Recommended: ** | ||
+ | * {{10.1.1.17.8112.pdf|Gharachorloo et al. (1990). Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors.}} | ||
+ | * {{10.1.1.89.3693.pdf|Gharachorloo et al. (1991). Two Techniques to Enhance the Performance of Memory Consistency Models.}} | ||
+ | * {{isca07_bulksc.pdf|Ceze et al. (2007). BulkSC: Bulk Enforcement of Sequential Consistency.}} | ||
+ | * {{censier.pdf|Censier et al. (1978). A new solution to coherence problems in multicache systems.}} | ||
+ | * {{goodman-snoopyprotocol.pdf|Goodman et al. (1983). Using cache memory to reduce processor-memory traffic.}} | ||
+ | * {{isca123.pdf|Laudon et al. (1997). The SGI Origin: a ccNUMA highly scalable server.}} | ||
+ | * {{isca03_token_coherence.pdf|Martin et al. (2003). Token coherence: decoupling performance and correctness.}} | ||
+ | * {{p73-baer.pdf|Baer et al. (1988). On the inclusion properties for multi-level cache hierarchies.}} | ||
+ | ** Mentioned during lecture: ** | ||
+ | * (HTML) [[http://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html|Dijkstra (1965) Cooperating Sequential Processes.]] | ||
+ | |||
+ | ===== Lecture 29 (4/10 Fri.) ===== | ||
+ | ** Required: ** | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/culler-mesi.pdf|C&S, Chapters 5.1 & 5.3]] | ||
+ | * P&H, Chapter 5.8 | ||
+ | ** Recommended: ** | ||
+ | ** Mentioned during lecture: ** |