User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
readings [2015/03/27 20:17]
kevincha [Lecture 23 (3/27 Fri.)]
readings [2015/04/06 23:13]
clement
Line 261: Line 261:
   * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}}   * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}}
   * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}}   * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}}
-  ​* {{raidr_isca12.pdf| Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (Sections 1 and 2)}}+   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}}
   * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}}   * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}}
 **Mentioned During Lecture:** **Mentioned During Lecture:**
Line 274: Line 274:
   * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}}   * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}}
   * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}}   * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}}
-  ​* {{raidr_isca12.pdf| Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (Sections 1 and 2)}}+   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}}
   * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}}   * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}}
  
Line 283: Line 283:
   * {{p362-ebrahimi.pdf|Ebrahimi,​ E., Miftakhutdinov,​ R., Fallin, C., Lee, C. J., Joao, J. A., Mutlu, O., & Patt, Y. N. (2011). Parallel application memory scheduling. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.}}   * {{p362-ebrahimi.pdf|Ebrahimi,​ E., Miftakhutdinov,​ R., Fallin, C., Lee, C. J., Joao, J. A., Mutlu, O., & Patt, Y. N. (2011). Parallel application memory scheduling. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.}}
   * {{p335-ebrahimi.pdf|Ebrahimi,​ E., Lee, C. J., Mutlu, O., & Patt, Y. N. (2010). Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems.}}   * {{p335-ebrahimi.pdf|Ebrahimi,​ E., Lee, C. J., Mutlu, O., & Patt, Y. N. (2010). Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems.}}
-  * [[bloom1970.pdf|Bloom,​ “Space/​Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970]] +  * {{bloom1970.pdf|Bloom,​ “Space/​Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970}} 
-  * {{moscibroda2007.pdf| Onur Mutluand Thomas Moscibroda, "Stall-Time Fair Memory ​Access Scheduling ​for Chip Multiprocessors" ​ISCA 2008}} +  * {{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}} 
-  * {{isca08.pdf| Onur Mutlu and Thomas Moscibroda, "Parallelism-Aware ​Batch SchedulingEnhancing both Performance and Fairness ​of Shared ​DRAM," ​ISCA 2008}} +   * {{http://​users.ece.cmu.edu/​~kevincha/​papers/​chang_hpca2014.pdf|Kevin Chang, Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu, "Improving DRAM Performance by Parallelizing Refreshes with Accesses",​ In HPCA 2014, Orlando, Feb. 2014.}} 
-  * {{thread_cluster_mem_sched.pdf| Yoongu ​Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter, "Thread Cluster Memory SchedulingExploiting Differences ​in Memory Access Behavior" ​43rd International Symposium on Microarchitecture ​(MICRO), pages 65-76AtlantaGADecember 2010}} + 
-  * {{ATLAS.pdf| Yoongu ​Kim, Dongsu HanOnur MutluMor Harchol-Balter"ATLASA Scalable and High-Performance ​Scheduling Algorithm for Multiple Memory Controllers"​}}+===== Lecture 24 (3/30 Mon.) ===== 
 +**Required:​** 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mutlu_hpca03.pdf | Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,​” HPCA 2003.}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​TR-HPS-2006-006.pdf|Srinathet al., “Feedback directed prefetching”,​ HPCA 2007.}} 
 + 
 +**Mentioned During Lecture:​** 
 +  * {{ramulator.pdf|Kim et al., “Ramulator:​ A Fast and Extensible DRAM Simulator,​” IEEE Computer Architecture Letters 2015.}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​eaf-cache_pact12.pdf|Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,​”PACT 2012.}} 
 +  * {{http://​hps.ece.utexas.edu/​pub/​TR-HPS-2010-002.pdf|Lee et al., “DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory ​Systems,​”HPS Technical Report, April 2010.}} 
 +  * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}} 
 +  * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​rlmc_isca08.pdf|Ipek et al., “Self Optimizing Memory Controllers:​ A Reinforcement Learning Approach,​”ISCA 2008}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mutlu_ieee_micro06.pdf|Mutlu et al., "​Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,"​ ISCA 2005, IEEE Micro Top Picks 2006.}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mutlu_micro05.pdf|Mutlu et al., "​Address-Value Delta (AVD) Prediction,"​ MICRO 2005.}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​armstrong_micro04.pdf|Armstrong et al., "Wrong Path Events,"​ MICRO 2004.}} 
 + 
 +===== Lecture 25 (4/1 Wed.) ===== 
 +**Required:​** 
 +  * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} 
 + 
 +**Mentioned During Lecture:​** 
 +  * {{jouppi1990.pdf|Jouppi,​ “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990.}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​TR-HPS-2006-006.pdf|Srinathet al., “Feedback directed prefetching”,​ HPCA 2007.}} 
 + 
 +===== Lecture 26 (4/3 Fri.) ===== 
 +** Required: ** 
 +  * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}} 
 + 
 +** Mentioned during lecture: ** 
 +  * {{raidr-isca12.pdf|Liu et al., “RAIDR: Retention-Aware ​Intelligent DRAM Refresh,” ISCA 2012.}} 
 +  * {{2012_isca_salp.pdf|Kim et al., “A Case for Exploiting Subarray-Level Parallelism in DRAM,” ISCA 2012.}} 
 +  * {{TLDRAM-Lee.pdf|Lee et al., “Tiered-Latency DRAMA Low Latency and Low Cost DRAM Architecture,​” HPCA 2013.}} 
 +  * {{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}} 
 +  * {{rowclone_micro13.pdf|Seshadri et al., “RowClone:​ Fast and Efficient In-DRAM Copy and Initialization of Bulk Data,” MICRO 2013.}} 
 +  * {{LCP.pdf|Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework,​” MICRO 2013.}} 
 +  * {{|Chang et al., “Improving DRAM Performance ​by Parallelizing Refreshes with Accesses,​” HPCA 2014.}} 
 +  * {{error-mitigation-for-intermittent-dram-failures_sigmetrics14.pdf|Khan et al., “The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study,” SIGMETRICS 2014.}} 
 +  * {{luo_dsn14.pdf|Luo et al., “Characterizing Application Memory Error Vulnerability to Optimize Data Center Cost,” DSN 2014.}} 
 +  * {{yoongu2014.pdf|Kim et al., “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014.}} 
 +  * {{meza_cal12.pdf|Meza et al., “Enabling Efficient ​and Scalable Hybrid Memories,​” IEEE Comp. Arch. Letters 2012.}} 
 +  * {{rowbuffer-aware-caching_iccd12.pdf|Yoon et al., “Row Buffer Locality Aware Caching Policies for Hybrid Memories,​” ICCD 2012.}} 
 +  * {{sttram_ispass13.pdf|Kultursay ​ et al., “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,​” ISPASS 2013. }} 
 +  * {{meza_weed13.pdf|Meza ​ et al., “A Case for Efficient Hardware-Software Cooperative Management ​of Storage and Memory,” WEED 2013.}} 
 +  * {{ISCA09.pdf|Lee ​ et al. “Architecting Phase Change Memory as a Scalable ​DRAM Alternative,” ISCA 2009.}} 
 +  * Meza+, “Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field,” DSN 2015. 
 +  * Qureshi+, “AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems,” DSN 2015. 
 +  * {{ramulator.pdf|Kim ​et al.“Ramulator:​ A Fast and Extensible DRAM Simulator,​” IEEE Computer Architecture Letters 2015.}} 
 +  * {{dirty-block-index_isca14.pdf|Seshadri+,​ “The Dirty-Block Index,” ISCA 2014.}} 
 +  * {{bdi-compression_pact12.pdf|Pekhimenko+,​ “Base-Delta-Immediate Compression:​ Practical Data Compression for On-Chip Caches,” PACT 2012.}} 
 +  * {{eaf-cache_pact12.pdf|Seshadri+,​ “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,​” PACT 2012.}} 
 +  * {{zhao-micro2014.pdf|Zhao+,​ “FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems,” MICRO 2014.}} 
 +  * {{a40-yoon.pdf|Yoon,​ Meza+, “Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories,​” ACM TACO 2014.}} 
 +  * {{compression-aware-cache-management_hpca15.pdf|Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry, Exploiting Compressed Block Size as an Indicator of Future Reuse, HPCA 2015.}} 
 +  * {{06974684.pdf|Lu+,​ “Loose Ordering Consistency for Persistent Memory,” ICCD 2014.}} 
 +  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu,​ O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance Computer Architecture.}} 
 +  * {{cooksey.pdf|Cooksey et al., “A stateless, content-directed data prefetching mechanism,​” ASPLOS 2002.}} 
 +  * {{zilles-2001.pdf|Zilles and Sohi, “Execution-based Prediction Using Speculative Slices”, ISCA 2001.}} 
 +  * {{zilles-2000.pdf|Zilles and Sohi, ”Understanding the backward slices of performance degrading instructions,​” ISCA 2000.}} 
 +  * {{http://​www.amazon.com/​Inside-AS-400-Second-Edition/​dp/​1882419669|Frank Soltis,"Inside the AS/​400"​}} 
 + 
 +===== Lecture 27 (4/6 Mon.) ===== 
 +** Required** 
 +  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl,​ G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}} 
 +  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport,​ L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}} 
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​culler-mesi.pdf|C&​S,​ Chapters 5.1 & 5.3]] 
 +  * P&H, Chapter 5.8 
 +** Recommended:​ ** 
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​hill_309_314.pdf|Hill,​ Jouppi, Sohi. "​Multiprocessors and Multicomputers,"​ pp. 551-560 ​in Readings in Computer Architecture.]] 
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​hill_551_560.pdf|Hill,​ Jouppi, Sohi. "Dataflow and Multithreading,"​ pp. 309-314 in Readings in Computer Architecture.]] 
 +  * {{01447203.pdf|Flynn,​ M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}} 
 +  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} 
 +** Mentioned during lecture: ** 
 +  * {{p176-baer.pdf|Baer,​ J.-L., & Chen, T.-F. (1991). An effective on-chip preloading scheme to reduce data access penalty. Proceedings of the 1991 ACM/IEEE conference on Supercomputing.}} 
 +  * {{04147648.pdf|Srinath,​ S., Mutlu, O., Kim, H., & Patt, Y. N. (2007). Feedback Directed Prefetching:​ Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture.}} 
 +  * {{joseph_grunwald_-_1997_-_prefetching_using_markov_predictors.pdf|Joseph,​ D., & Grunwald, D. (1997). Prefetching using Markov predictors. Proceedings of the 24th annual international symposium on Computer architecture.}} 
 +  * {{p279-cooksey.pdf|CookseyR., Jourdan, S., & Grunwald, D. (2002). A stateless, content-directed data prefetching mechanism. Proceedings of the 10th international conference on Architectural support for programming languages and operating systems.}} 
 +  * {{04798232.pdf|EbrahimiE.MutluO., & Patt, Y. N. (2009). Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. High Performance Computer Architecture,​ 2009.}} 
 +  * {{p186-chappell.pdf|Chappell, R. S., Stark, J., Kim, S. P.ReinhardtS. K., & Patt, Y. N. (1999). Simultaneous subordinate microthreading (SSMT). Proceedings of the 26th annual international symposium on Computer architecture.}} 
 +  * {{p2-zilles.pdf|ZillesC., & Sohi, G. (2001). Execution-based prediction using speculative slices. Proceedings of the 28th annual international symposium on Computer architecture.}} 
 +  * {{p40-luk.pdf|Luk,​ C.-K. (2001). Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. Proceedings of the 28th annual international symposium on Computer architecture.}} 
 +  * {{p172-zilles.pdf|Zilles,​ C. B., & Sohi, G. S. (2000). Understanding the backward slices of performance degrading instructions. Proceedings of the 27th annual international symposium on Computer architecture.}} 
 +  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu,​ O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead ExecutionAn Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance ​Computer Architecture.}} 
 +  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|Jouppi,​ N. P. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Proceedings of the 17th annual international symposium on Computer Architecture.}}
readings.txt · Last modified: 2015/04/13 19:31 by kevincha