Differences

This shows you the differences between two versions of the page.

--- readings [2014/03/28 20:23]
rachata
+++ readings [2014/05/20 01:46]
rachata
@@ Line 1: / Line 1: @@
 ====== Readings ======
   * **P&P** stands for Patt & Patel's //Introduction to Computing Systems: From Bits and Gates to C and Beyond//
@@ Line 223: / Line 223: @@
   * {{p335-ebrahimi.pdf|Ebrahimi, E., Lee, C. J., Mutlu, O., & Patt, Y. N. (2010). Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems.}}
   * {{p362-ebrahimi.pdf|Ebrahimi, E., Miftakhutdinov, R., Fallin, C., Lee, C. J., Joao, J. A., Mutlu, O., & Patt, Y. N. (2011). Parallel application memory scheduling. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.}}
+===== Lecture 25 (4/2 Wed.) =====
+** Mentioned during lecture: **
+  * {{raidr-isca12.pdf|Liu et al., RAIDR: Retention-Aware Intelligent DRAM Refresh, ISCA 2012.}}
+  * {{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}}
+  * {{4299a065.pdf|Kim, Y., Papamichael, M., Mutlu, O., & Harchol-Balter, M. (2010). Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.}}
+  * {{muralidhara_et_al._-_2011_-_reducing_memory_interference_in_multicore_systems_via_application-aware_memory_channel_partitioning.pdf|Muralidhara, S. P., Subramanian, L., Mutlu, O., Kandemir, M., & Moscibroda, T. (2011). Reducing memory interference in multicore systems via application-aware memory channel partitioning. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.}}
+  * {{p335-ebrahimi.pdf|Ebrahimi, E., Lee, C. J., Mutlu, O., & Patt, Y. N. (2010). Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems.}}
+  * {{p362-ebrahimi.pdf|Ebrahimi, E., Miftakhutdinov, R., Fallin, C., Lee, C. J., Joao, J. A., Mutlu, O., & Patt, Y. N. (2011). Parallel application memory scheduling. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.}}
+  * {{isca08_ipek.pdf|Ipek, E., Mutlu, O., Martinez, J., Caruana, R. (2008). Self-Optimizing Memory Controllers: A Reinforcement Learning Approach. Proceedings of the 42th Annual IEEE/ACM International Symposium on Microarchitecture.}}
+===== Lecture 25 (4/7 Mon.) =====
+** Required: **
+  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu, O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance Computer Architecture.}}
+  * {{04147648.pdf|Srinath, S., Mutlu, O., Kim, H., & Patt, Y. N. (2007). Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture.}}
+** Recommended: **
+  * {{24400233.pdf|Mutlu, O., Kim, H., & Patt, Y. N. (2005). Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.}}
+  * {{01603492.pdf|Mutlu, O., Kim, H., & Patt, Y. N. (2006). Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance. IEEE Micro.}}
+  * {{21260119.pdf|Armstrong, D. N., Kim, H., Mutlu, O., & Patt, Y. N. (2004). Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery. Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture.}}
+===== Lecture 27 (4/8 Wed.) =====
+** Required: **
+  * None
+** Mentioned during lecture: **
+  * {{p176-baer.pdf|Baer, J.-L., & Chen, T.-F. (1991). An effective on-chip preloading scheme to reduce data access penalty. Proceedings of the 1991 ACM/IEEE conference on Supercomputing.}}
+  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|Jouppi, N. P. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Proceedings of the 17th annual international symposium on Computer Architecture.}}
+  * {{mowry_lam_gupta_-_1992_-_design_and_evaluation_of_a_compiler_algorithm_for_prefetching.pdf|Mowry, T. C., Lam, M. S., & Gupta, A. (1992). Design and evaluation of a compiler algorithm for prefetching. Proceedings of the fifth international conference on Architectural support for programming languages and operating systems.}}
+===== Lecture 28 (4/14 Mon.) =====
+** Required: **
+  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}}
+  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport, L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}}
+  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/culler-mesi.pdf|C&S, Chapters 5.1 & 5.3]]
+  * P&H, Chapter 5.8
+** Recommended: **
+  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_309_314.pdf|Hill, Jouppi, Sohi. "Multiprocessors and Multicomputers," pp. 551-560 in Readings in Computer Architecture.]]
+  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_551_560.pdf|Hill, Jouppi, Sohi. "Dataflow and Multithreading," pp. 309-314 in Readings in Computer Architecture.]]
+  * {{01447203.pdf|Flynn, M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}}
+  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos, M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}}
+** Mentioned during lecture: **
+  * {{p176-baer.pdf|Baer, J.-L., & Chen, T.-F. (1991). An effective on-chip preloading scheme to reduce data access penalty. Proceedings of the 1991 ACM/IEEE conference on Supercomputing.}}
+  * {{04147648.pdf|Srinath, S., Mutlu, O., Kim, H., & Patt, Y. N. (2007). Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture.}}
+  * {{joseph_grunwald_-_1997_-_prefetching_using_markov_predictors.pdf|Joseph, D., & Grunwald, D. (1997). Prefetching using Markov predictors. Proceedings of the 24th annual international symposium on Computer architecture.}}
+  * {{p279-cooksey.pdf|Cooksey, R., Jourdan, S., & Grunwald, D. (2002). A stateless, content-directed data prefetching mechanism. Proceedings of the 10th international conference on Architectural support for programming languages and operating systems.}}
+  * {{04798232.pdf|Ebrahimi, E., Mutlu, O., & Patt, Y. N. (2009). Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. High Performance Computer Architecture, 2009.}}
+  * {{p186-chappell.pdf|Chappell, R. S., Stark, J., Kim, S. P., Reinhardt, S. K., & Patt, Y. N. (1999). Simultaneous subordinate microthreading (SSMT). Proceedings of the 26th annual international symposium on Computer architecture.}}
+  * {{p2-zilles.pdf|Zilles, C., & Sohi, G. (2001). Execution-based prediction using speculative slices. Proceedings of the 28th annual international symposium on Computer architecture.}}
+  * {{p40-luk.pdf|Luk, C.-K. (2001). Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. Proceedings of the 28th annual international symposium on Computer architecture.}}
+  * {{p172-zilles.pdf|Zilles, C. B., & Sohi, G. S. (2000). Understanding the backward slices of performance degrading instructions. Proceedings of the 27th annual international symposium on Computer architecture.}}
+  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu, O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance Computer Architecture.}}
+  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|Jouppi, N. P. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Proceedings of the 17th annual international symposium on Computer Architecture.}}
+===== Lecture 29 (4/16 Wed.) =====
+** Required: **
+  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}}
+  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport, L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}}
+  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/culler-mesi.pdf|C&S, Chapters 5.1 & 5.3]]
+  * P&H, Chapter 5.8
+===== Lecture 30 (4/18 Fri.) =====
+** Required: **
+  * {{LCP.pdf|Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency,” MICRO 2013.}}
+  * {{bdi-compression_pact12.pdf|Pekhimenko et al., "Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches," PACT 2012.}}
+  * {{mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013.}}
+===== Lecture 31 (4/28 Mon.) =====
+** Required: **
+  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}}
+  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport, L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}}
+  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/culler-mesi.pdf|C&S, Chapters 5.1 & 5.3]]
+  * P&H, Chapter 5.8
+** Recommended: **
+  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_309_314.pdf|Hill, Jouppi, Sohi. "Multiprocessors and Multicomputers," pp. 551-560 in Readings in Computer Architecture.]]
+  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_551_560.pdf|Hill, Jouppi, Sohi. "Dataflow and Multithreading," pp. 309-314 in Readings in Computer Architecture.]]
+  * {{01447203.pdf|Flynn, M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}}
+  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos, M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}}
+** Mentioned during lecture: **
+  * {{p168-patel.pdf|Patel, J. H. (1979). Processor-memory interconnections for multiprocessors. Proceedings of the 6th annual symposium on Computer architecture.}}
+  * {{p196-moscibroda.pdf|Moscibroda, T., & Mutlu, O. (2009). A case for bufferless routing in on-chip networks. Proceedings of the 36th annual international symposium on Computer architecture.}}
+  * {{p27-gottlieb.pdf|Gottlieb, A., Grishman, R., Kruskal, C. P., McAuliffe, K. P., Rudolph, L., & Snir, M. (1982). The NYU Ultracomputer -- designing a MIMD, shared-memory parallel machine (Extended Abstract). Proceedings of the 9th annual symposium on Computer Architecture.}}
+  * {{p22-seitz.pdf|Seitz, C. L. (1985). The cosmic cube. Commun. ACM.}}
+  * {{p278-glass.pdf|Glass, C. J., & Ni, L. M. (1992). The turn model for adaptive routing. Proceedings of the 19th annual international symposium on Computer architecture.}}
+===== Lecture 32 (4/30 Wed.) =====
+** Required: **
+  * None
+** Mentioned during lecture: **
+  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}}
+  * {{grochowski_et_al._-_2004_-_best_of_both_latency_and_throughput.pdf|Grochowski, E., Ronen, R., Shen, J., & Wang, H. (2004). Best of Both Latency and Throughput. Proceedings of the IEEE International Conference on Computer Design (pp. 236–243).}}
+  * {{tendler_et_al._-_2002_-_power4_system_microarchitecture.pdf|Tendler, J. M., Dodson, J. S., Fields, J. S., Le, H., & Sinharoy, B. (2002). POWER4 system microarchitecture. IBM J. Res. Dev.}}
+  * {{01289290.pdf|Kalla, R., Sinharoy, B., & Tendler, J. M. (2004). IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro.}}
+  * {{kongetira_aingaran_olukotun_-_2005_-_niagara_a_32-way_multithreaded_sparc_processor.pdf|Kongetira, P., Aingaran, K., & Olukotun, K. (2005). Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro.}}
+  * {{p253-suleman.pdf|Suleman, M. A., Mutlu, O., Qureshi, M. K., & Patt, Y. N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. Proceedings of the 14th international conference on Architectural support for programming languages and operating systems.}}
+  * {{p441-suleman.pdf|Suleman, M. A., Mutlu, O., Joao, J. A., Khubaib, & Patt, Y. N. (2010). Data marshaling for multi-core architectures. Proceedings of the 37th annual international symposium on Computer architecture.}}
+  * {{p223-joao.pdf|Joao, J. A., Suleman, M. A., Mutlu, O., & Patt, Y. N. (2012). Bottleneck identification and scheduling in multithreaded applications. Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems.}}
+===== Lecture 33 (5/2 Fri.) =====
+** Required: **
+  * None
+** Mentioned during lecture: **
+  * Liu, Jaiyen, Veras, Mutlu, “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.
+  * Kim, Seshadri, Lee+, “A Case for Exploiting Subarray-Level Parallelism in DRAM,” ISCA 2012.
+  * Lee+, “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013.
+  * Liu+, “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.
+  * Seshadri+, “RowClone: Fast and Efficient In-DRAM Copy and Initialization of Bulk Data,” MICRO 2013.
+  * Pekhimenko+, “Linearly Compressed Pages: A Main Memory Compression Framework,” MICRO 2013.
+  * Chang+, “Improving DRAM Performance by Parallelizing Refreshes with Accesses,” HPCA 2014.
+  * Khan+, “The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study,” SIGMETRICS 2014.
+  * Luo+, “Characterizing Application Memory Error Vulnerability to Optimize Data Center Cost,” DSN 2014.
+  * Kim+, “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014.
+  * Lee, Ipek, Mutlu, Burger, “Architecting Phase Change Memory as a Scalable DRAM Alternative,” ISCA 2009, CACM 2010, Top Picks 2010.
+  * Meza, Chang, Yoon, Mutlu, Ranganathan, “Enabling Efficient and Scalable Hybrid Memories,” IEEE Comp. Arch. Letters 2012.
+  * Yoon, Meza et al., “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012.
+  * Kultursay+, “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013.
+  * Meza+, “A Case for Efficient Hardware-Software Cooperative Management of Storage and
+Memory,” WEED 2013.
+  * Lee, Ipek, Mutlu, Burger, “Architecting Phase Change Memory as a Scalable DRAM Alternative,” ISCA 2009.
+  * Meza+, “Enabling Efficient and Scalable Hybrid Memories,” IEEE Comp. Arch. Letters, 2012.
+  * Yoon, Meza et al., “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012 Best Paper Award.

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools