Differences

This shows you the differences between two versions of the page.

--- readings [2015/02/20 19:23]
rachata
+++ readings [2015/03/26 00:24]
albert
@@ Line 147: / Line 147: @@
   * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}}
-===== Lecture 13 (2/18 Wed.) =====
+===== Lecture 14 (2/18 Wed.) =====
 ** Required **
   * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}}
@@ Line 165: / Line 165: @@
   * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers.}}
-===== Lecture 14 (2/20 Fri.) =====
+===== Lecture 15 (2/20 Fri.) =====
 ** Required **
   * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}}
@@ Line 182: / Line 182: @@
   * {{jog_orchestrated.pdf|Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. Orchestrated scheduling and prefetching for GPGPUs. ISCA '13}}
   * {{large-gpu-warps_micro11|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N. Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling.MICRO-44}}
+===== Lecture 16 (2/23 Mon.) =====
+**Mentioned during lecture:**
+  * {{:mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013}}
+  * [[http://users.ece.cmu.edu/~omutlu/pub/mph_usenix_security07.pdf|Moscibroda, T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.]]
+  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures? IEEE Computer.}}
+  * {{01675827.pdf|Fisher, J. A. (1981). Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Comput.}}
+  * {{2fbf01205185.pdf|Hwu, W.-M. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., et al. (1993). The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput.}}
+  * {{p45-mahlke.pdf|Mahlke, S. A., Lin, D. C., Chen, W. Y., Hank, R. E., & Bringmann, R. A. (1992). Effective compiler support for predicated execution using the hyperblock. Proceedings of the 25th annual international symposium on Microarchitecture.}}
+  * {{melvin_patt_-_1995_-_enhancing_instruction_scheduling_with_a_block-structured_isa.pdf|Melvin, S., & Patt, Y. (1995). Enhancing instruction scheduling with a block-structured ISA. Int. J. Parallel Program.}}
+  * {{hao_et_al._-_1996_-_increasing_the_instruction_fetch_rate_via_block-structured_instruction_set_architectures.pdf|Hao, E., Chang, P.-Y., Evers, M., & Patt, Y. N. (1996). Increasing the instruction fetch rate via block-structured instruction set architectures. Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture.}}
+  * {{00877947.pdf|Huck, J., Morris, D., Ross, J., Knies, A., Mulder, H., & Zahir, R. (2000). Introducing the IA-64 architecture. IEEE Micro.}}
+  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}}
+  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers.}}
+  *  {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher, J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}}
+  * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith, J. E. (1982). Decoupled access/execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}}
+  * {{p289-smith.pdf|Smith, J. E. (1984). Decoupled access/execute computer architectures. ACM Trans. Comput. Syst.}}
+  * {{:ilp_history_overview_perspective.pdf|Rau and Fisher, “Instruction-level parallel processing: history, overview, and perspective,” Journal of Supercomputing, 1993.}}
+  * {{:ieee_proceedings_2001_-_compiler_techniques.pdf|Faraboschi et al., “Instruction Scheduling for Instruction Level Parallel Processors,” Proc. IEEE, Nov. 2001.
+}}
+===== Lecture 17 (2/25 Wed.) =====
+**Required:**
+  * {{00877947.pdf|Huck, J., Morris, D., Ross, J., Knies, A., Mulder, H., & Zahir, R. (2000). Introducing the IA-64 architecture. IEEE Micro.}}
+  * P&H Chapters 5.1-5.3 (cache chapters)
+  * Hamacher et al. Chapters 8.1-8.7 (cache/memory chapters)
+  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes, M. V. (1965). Slave Memories and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}}
+  * {{:liptay68.pdf|Liptay, “Structural aspects of the System/360 Model 85 II: the cache,” IBM Systems Journal, 1968.
+}}
+===== Lecture 18 (2/27 Fri.) =====
+**Required:**
+  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes, M. V. (1964). Slave Memories and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}}
+  * {{A_Case_For_MLP_Aware_Cache_Replacement.pdf|nQureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006.}}
+  * P&H Chapters 5.1-5.3 (cache chapters)
+  * Hamacher et al. Chapters 8.1-8.7 (cache/memory chapters)
+**Mentioned During Lecture:**
+  * {{A_Study_of_replacement_algorithms_for_a_virtual-storage_computer.pdf|qBelady, “A study of replacement algorithms for a virtual-storage computer,” IBM Systems Journal, 1966.}}
+===== Lecture 19 (3/2 Mon.) =====
+**Required:**
+  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes, M. V. (1964). Slave Memories and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}}
+  * {{A_Case_For_MLP_Aware_Cache_Replacement.pdf|Qureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006.}}
+  * P&H Chapters 5.1-5.3 (cache chapters)
+  * Hamacher et al. Chapters 8.1-8.7 (cache/memory chapters)
+**Mentioned During Lecture:**
+  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|Jouppi, N. P. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Proceedings of the 17th annual international symposium on Computer Architecture.}}
+  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu, O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance Computer Architecture.}}
+  * {{seznec_a_case_for_two_way_skewed_associative_caches.pdf|Seznec. A Case for Two-Way Skewed-Associative Caches. ISCA 1993.}}
+  * {{seznec_a_case_for_two_way_skewed_associative_caches.pdf|Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. ISCA 1981.}}
+  * {{qureshi_utility_based_cache_partitioning.pdf|Qureshi and Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. MICRO 2006.}}
+  * {{suh_new_memory_monitoring_scheme_for_memory_aware_scheduling_and_partitioning.pdf|Suh et al. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. HPCA 2002.}}
+===== Lecture 20 (3/4 Wed.) =====
+**Required:**
+  * Section 5.4 in P&H
+**Mentioned During Lecture:**
+  * Section 8.8 in Hamacher et al.
+  * {{megiddo.pdf|Megiddo and Modha, “ARC: A Self-Tuning, Low Overhead Replacement Cache,” FAST 2003.}}
+===== Lecture 21 (3/23 Mon.) =====
+**Required:**
+  * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}}
+  * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}}
+  * {{raidr-isca12.pdf| Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (Sections 1 and 2)}}
+  * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}}
+**Mentioned During Lecture:**
+  * {{flipping_kim-isca14.pdf| Kim+, “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014.}}
+  * {{flash-memory-data-retention_hpca15.pdf| Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, and Onur Mutlu,
+"Data Retention in MLC NAND Flash Memory: Characterization, Optimization and Recovery,"
+Proceedings of the 21st International Symposium on High-Performance Computer Architecture (HPCA), Bay Area, CA, February 2015. }}
+  * {{coarchitecting-kang.pdf| Kang+, "Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling"}}
+===== Lecture 22 (3/25 Wed.) =====
+**Required:**
+  * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}}
+  * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}}
+  * {{raidr_isca12.pdf| Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (Sections 1 and 2)}}
+  * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges and Opportunities," Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}}
+**Mentioned During Lecture:**
+  * {{moscibroda.pdf| Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,”USENIX Security 2007}}
+  * {{moscibroda2007.pdf| Onur Mutluand Thomas Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors" ISCA 2008}}
+  * {{isca08.pdf| Onur Mutlu and Thomas Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM," ISCA 2008}}
+  * {{thread_cluster_mem_sched.pdf| Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter, "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior" 43rd International Symposium on Microarchitecture (MICRO), pages 65-76, Atlanta, GA, December 2010}}
+  * {{ATLAS.pdf| Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter, "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers"}}

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools