This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
readings [2014/04/16 14:12] rachata |
readings [2014/12/11 00:09] 127.0.0.1 external edit |
||
---|---|---|---|
Line 287: | Line 287: | ||
* P&H, Chapter 5.8 | * P&H, Chapter 5.8 | ||
+ | ===== Lecture 30 (4/18 Fri.) ===== | ||
+ | ** Required: ** | ||
+ | * {{LCP.pdf|Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency,” MICRO 2013.}} | ||
+ | * {{bdi-compression_pact12.pdf|Pekhimenko et al., "Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches," PACT 2012.}} | ||
+ | * {{mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013.}} | ||
+ | |||
+ | ===== Lecture 31 (4/28 Mon.) ===== | ||
+ | ** Required: ** | ||
+ | * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}} | ||
+ | * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport, L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}} | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/culler-mesi.pdf|C&S, Chapters 5.1 & 5.3]] | ||
+ | * P&H, Chapter 5.8 | ||
+ | ** Recommended: ** | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_309_314.pdf|Hill, Jouppi, Sohi. "Multiprocessors and Multicomputers," pp. 551-560 in Readings in Computer Architecture.]] | ||
+ | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/hill_551_560.pdf|Hill, Jouppi, Sohi. "Dataflow and Multithreading," pp. 309-314 in Readings in Computer Architecture.]] | ||
+ | * {{01447203.pdf|Flynn, M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}} | ||
+ | * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos, M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} | ||
+ | ** Mentioned during lecture: ** | ||
+ | * {{p168-patel.pdf|Patel, J. H. (1979). Processor-memory interconnections for multiprocessors. Proceedings of the 6th annual symposium on Computer architecture.}} | ||
+ | * {{p196-moscibroda.pdf|Moscibroda, T., & Mutlu, O. (2009). A case for bufferless routing in on-chip networks. Proceedings of the 36th annual international symposium on Computer architecture.}} | ||
+ | * {{p27-gottlieb.pdf|Gottlieb, A., Grishman, R., Kruskal, C. P., McAuliffe, K. P., Rudolph, L., & Snir, M. (1982). The NYU Ultracomputer -- designing a MIMD, shared-memory parallel machine (Extended Abstract). Proceedings of the 9th annual symposium on Computer Architecture.}} | ||
+ | * {{p22-seitz.pdf|Seitz, C. L. (1985). The cosmic cube. Commun. ACM.}} | ||
+ | * {{p278-glass.pdf|Glass, C. J., & Ni, L. M. (1992). The turn model for adaptive routing. Proceedings of the 19th annual international symposium on Computer architecture.}} | ||
+ | |||
+ | ===== Lecture 32 (4/30 Wed.) ===== | ||
+ | ** Required: ** | ||
+ | * None | ||
+ | |||
+ | ** Mentioned during lecture: ** | ||
+ | * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}} | ||
+ | * {{grochowski_et_al._-_2004_-_best_of_both_latency_and_throughput.pdf|Grochowski, E., Ronen, R., Shen, J., & Wang, H. (2004). Best of Both Latency and Throughput. Proceedings of the IEEE International Conference on Computer Design (pp. 236–243).}} | ||
+ | * {{tendler_et_al._-_2002_-_power4_system_microarchitecture.pdf|Tendler, J. M., Dodson, J. S., Fields, J. S., Le, H., & Sinharoy, B. (2002). POWER4 system microarchitecture. IBM J. Res. Dev.}} | ||
+ | * {{01289290.pdf|Kalla, R., Sinharoy, B., & Tendler, J. M. (2004). IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro.}} | ||
+ | * {{kongetira_aingaran_olukotun_-_2005_-_niagara_a_32-way_multithreaded_sparc_processor.pdf|Kongetira, P., Aingaran, K., & Olukotun, K. (2005). Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro.}} | ||
+ | * {{p253-suleman.pdf|Suleman, M. A., Mutlu, O., Qureshi, M. K., & Patt, Y. N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. Proceedings of the 14th international conference on Architectural support for programming languages and operating systems.}} | ||
+ | * {{p441-suleman.pdf|Suleman, M. A., Mutlu, O., Joao, J. A., Khubaib, & Patt, Y. N. (2010). Data marshaling for multi-core architectures. Proceedings of the 37th annual international symposium on Computer architecture.}} | ||
+ | * {{p223-joao.pdf|Joao, J. A., Suleman, M. A., Mutlu, O., & Patt, Y. N. (2012). Bottleneck identification and scheduling in multithreaded applications. Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems.}} | ||
+ | |||
+ | ===== Lecture 33 (5/2 Fri.) ===== | ||
+ | ** Required: ** | ||
+ | * None | ||
+ | |||
+ | ** Mentioned during lecture: ** | ||
+ | * {{raidr-isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} | ||
+ | * {{2012_isca_salp.pdf|Kim et al., “A Case for Exploiting Subarray-Level Parallelism in DRAM,” ISCA 2012.}} | ||
+ | * {{TLDRAM-Lee.pdf|Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013.}} | ||
+ | * {{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}} | ||
+ | * {{rowclone_micro13.pdf|Seshadri et al., “RowClone: Fast and Efficient In-DRAM Copy and Initialization of Bulk Data,” MICRO 2013.}} | ||
+ | * {{LCP.pdf|Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework,” MICRO 2013.}} | ||
+ | * {{|Chang et al., “Improving DRAM Performance by Parallelizing Refreshes with Accesses,” HPCA 2014.}} | ||
+ | * {{error-mitigation-for-intermittent-dram-failures_sigmetrics14.pdf|Khan et al., “The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study,” SIGMETRICS 2014.}} | ||
+ | * {{luo_dsn14.pdf|Luo et al., “Characterizing Application Memory Error Vulnerability to Optimize Data Center Cost,” DSN 2014.}} | ||
+ | * Kim et al., “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014. | ||
+ | * {{meza_cal12.pdf|Meza et al., “Enabling Efficient and Scalable Hybrid Memories,” IEEE Comp. Arch. Letters 2012.}} | ||
+ | * {{rowbuffer-aware-caching_iccd12.pdf|Yoon et al., “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012.}} | ||
+ | * {{sttram_ispass13.pdf|Kultursay et al., “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013. }} | ||
+ | * {{meza_weed13.pdf|Meza et al., “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.}} | ||
+ | * {{ISCA09.pdf|Lee et al. “Architecting Phase Change Memory as a Scalable DRAM Alternative,” ISCA 2009.}} |