User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
readings [2015/02/24 02:19]
kevincha [Lecture 16 (2/23 Fri.)]
readings [2015/04/13 19:31] (current)
kevincha
Line 183: Line 183:
   * {{large-gpu-warps_micro11|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov,​ Onur Mutlu, and Yale N. Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling.MICRO-44}}   * {{large-gpu-warps_micro11|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov,​ Onur Mutlu, and Yale N. Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling.MICRO-44}}
  
-====== Readings ======+===== Lecture 16 (2/23 Mon.) ===== 
 +**Mentioned during lecture:​** 
 +  * {{:​mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013}} 
 +  * [[http://​users.ece.cmu.edu/​~omutlu/​pub/​mph_usenix_security07.pdf|Moscibroda,​ T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.]] 
 +  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung,​ H. T. (1982). Why Systolic Architectures?​ IEEE Computer.}} 
 +  * {{01675827.pdf|Fisher,​ J. A. (1981). Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Comput.}} 
 +  * {{2fbf01205185.pdf|Hwu,​ W.-M. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., et al. (1993). The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput.}} 
 +  * {{p45-mahlke.pdf|Mahlke,​ S. A., Lin, D. C., Chen, W. Y., Hank, R. E., & Bringmann, R. A. (1992). Effective compiler support for predicated execution using the hyperblock. Proceedings of the 25th annual international symposium on Microarchitecture.}} 
 +  * {{melvin_patt_-_1995_-_enhancing_instruction_scheduling_with_a_block-structured_isa.pdf|Melvin,​ S., & Patt, Y. (1995). Enhancing instruction scheduling with a block-structured ISA. Int. J. Parallel Program.}} 
 +  * {{hao_et_al._-_1996_-_increasing_the_instruction_fetch_rate_via_block-structured_instruction_set_architectures.pdf|Hao,​ E., Chang, P.-Y., Evers, M., & Patt, Y. N. (1996). Increasing the instruction fetch rate via block-structured instruction set architectures. Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture.}} 
 +  * {{00877947.pdf|Huck,​ J., Morris, D., Ross, J., Knies, A., Mulder, H., & Zahir, R. (2000). Introducing the IA-64 architecture. IEEE Micro.}} 
 +  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone,​ M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}} 
 +  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone,​ M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture,​ implementation,​ and performance. IEEE Transactions on Computers.}} 
 +  *  {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher,​ J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}} 
 +  * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith,​ J. E. (1982). Decoupled access/​execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}} 
 +  * {{p289-smith.pdf|Smith,​ J. E. (1984). Decoupled access/​execute computer architectures. ACM Trans. Comput. Syst.}} 
 +  * {{:​ilp_history_overview_perspective.pdf|Rau and Fisher, “Instruction-level parallel processing:​ history,​ overview, and perspective,​” Journal of Supercomputing,​ 1993.}} 
 +  * {{:​ieee_proceedings_2001_-_compiler_techniques.pdf|Faraboschi et al., “Instruction Scheduling for Instruction Level Parallel Processors,​” Proc. IEEE, Nov. 2001. 
 +}}
  
-  ​* **P&P** stands for Patt & Patel'​s //​Introduction to Computing Systems: From Bits and Gates to C and Beyond// +===== Lecture 17 (2/25 Wed.) ===== 
-    (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap1.pdf|P&P Chapter 1 (Fundamentals)]] +**Required:** 
-    * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&P Chapter 4 (The von Neumann Model)]] +  {{00877947.pdf|Huck, J., Morris, D., Ross, J., Knies, A., Mulder, H., Zahir, R. (2000). Introducing the IA-64 architectureIEEE Micro.}} 
-    (CMU WebISO) [[http://​www.ece.cmu.edu/~ece447/​cmu_only/​pp-appendixa.pdf|P&P Appendix A (The LC-3b ISA)]] +  * P&H Chapters 5.1-5.3 ​(cache chapters
-    (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&P Appendix C (The Microarchitecture ​of the LC-3bBasic Machine)]] +  Hamacher et alChapters 8.1-8.7 (cache/memory chapters) 
-  * **P&H** stands for Patterson & Hennessy'​s //Computer Organization and Design: The Hardware/​Software Interface//+  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes, M. V. (1965). Slave Memories and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}} 
 +  {{:liptay68.pdf|Liptay, “Structural aspects ​of the System/360 Model 85 II: the cache,” IBM Systems Journal, 1968. 
 +}}
  
-===== Lecture ​(1/13 Mon.) =====+===== Lecture ​18 (2/27 Fri.) =====
 **Required:​** **Required:​**
-  * None+  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes,​ M. V. (1964). Slave Memories and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}} 
 +  * {{A_Case_For_MLP_Aware_Cache_Replacement.pdf|nQureshi et al., “A Case for MLP-Aware Cache Replacement,​“ ISCA 2006.}} 
 +  * P&H Chapters 5.1-5.3 (cache chapters) 
 +  * Hamacher et al. Chapters 8.1-8.7 (cache/​memory chapters)
  
-**Mentioned ​during lecture:**+**Mentioned ​During Lecture:** 
 +  * {{A_Study_of_replacement_algorithms_for_a_virtual-storage_computer.pdf|qBelady,​ “A study of replacement algorithms for a virtual-storage computer,​” IBM Systems Journal, 1966.}}
  
-  * {{bstj29-2-147.pdf|Hamming,​ R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2).}} +===== Lecture ​19 (3/2 Mon.) =====
-  * {{youandyourresearch.pdf|Hamming,​ R. W. (1986). You and Your Research. Transcription of the Bell Communications Research Colloquium Seminar.}} +
-    * [[http://​www.youtube.com/​watch?​v=a1zDuOPkMSw|youtube]] +
-  * {{05392210.pdf|Amdahl,​ G. M., Blaauw, G. A., & Brooks, F. P. (1964). Architecture of the IBM system/360. IBM J. Res. Dev., 8(2).}} +
-  * {{p128-rixner.pdf|Rixner,​ S., Dally, W. J., Kapasi, U. J., Mattson, P., & Owens, J. D. (2000). Memory access scheduling. Proceedings of the 27th annual international symposium on Computer architecture.}} +
-  * {{us5630096.pdf|William K. Zuravleff, & Robinson, T. (1997). Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order.}} +
-  * {{00964437.pdf|Patt,​ Y. (2001). Requirements,​ bottlenecks,​ and good fortune: agents for microprocessor evolution. Proceedings of the IEEE.}} +
-  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mph_usenix_security07.pdf|Moscibroda,​ T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.}} +
-   * {{http://​research.microsoft.com/​pubs/​79625/​MICRO2007.pdf|Onur Mutlu and Thomas Moscibroda, "​Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors",​ MICRO 2007. }} +
-   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-channel-partitioning-micro11.pdf|Sai Prashanth Muralidhara,​ Lavanya Subramanian,​ Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda, "​Reducing Memory Interference in Multicore Systems via Application-Aware  +
-   * Memory Channel Partitioning",​ MICRO 2011.}} +
-   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} +
-   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-scaling_memcon13.pdf|Onur Mutlu, "​Memory Scaling: A Systems Architecture Perspective"​ Technical talk at MemCon 2013 (MEMCON), Santa Clara, CA, August 2013.}} +
- +
-===== Lecture ​(1/15 Wed.) =====+
 **Required:​** **Required:​**
-  * {{00964437.pdf|PattY. (2001). Requirements,​ bottlenecks, ​and good fortune: agents for microprocessor evolutionProceedings of the IEEE.}} +  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|WilkesM. V. (1964). Slave Memories ​and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}} 
-  * {{moscibroda.pdf|Moscibroda, T., & MutluO. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.}} +  * {{A_Case_For_MLP_Aware_Cache_Replacement.pdf|Qureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006.}} 
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap1.pdf|P&P Chapter ​1 (Fundamentals)]] +  * P&H Chapters 5.1-5.3 (cache chapters
-  * P&​H ​Chapters 1 and 2 (Intro, Abstractions,​ ISA, MIPS)+  * Hamacher et al. Chapters ​8.1-8.7 (cache/​memory chapters)
  
-**Mentioned ​during lecture:** +**Mentioned ​During Lecture:** 
-  * {{gordon_moore_1965_article.pdf|MooreGE. (1965). Cramming More Components onto Integrated CircuitsElectronics,​ 38(8).}} +  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|JouppiNP. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffersProceedings of the 17th annual international symposium on Computer Architecture.}} 
-  * {{bab6286.0001.001.pdf|BurksA. W., GoldstineHH., & NeumannJvon. (1946). Preliminary discussion ​of the logical design of an electronic computing instrument.}} +  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|MutluO., StarkJ., Wilkerson, C., & PattYN. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings ​of the 9th International Symposium on High-Performance Computer Architecture.}} 
-  * {{p126-dennis.pdf|Dennis, J. B., & Misunas, D. P. (1975). A preliminary architecture ​for a basic data-flow processorProceedings of the 2nd annual symposium on Computer architecture.}} +  * {{seznec_a_case_for_two_way_skewed_associative_caches.pdf|Seznec. A Case for Two-Way Skewed-Associative CachesISCA 1993.}} 
-  * {{p34-gurd.pdf|Gurd, JR., Kirkham, C. C., & Watson, I. (1985). The Manchester prototype dataflow computer. Commun. ACM, 28(1).}} +  * {{seznec_a_case_for_two_way_skewed_associative_caches.pdf|KroftLockup-Free Instruction Fetch/​Prefetch Cache OrganizationISCA 1981.}} 
-  * Kuhn, TS(1962)The Structure of Scientific Revolutions+  * {{qureshi_utility_based_cache_partitioning.pdf|Qureshi and PattUtility-Based Cache Partitioning:​ A Low-Overhead,​ High-Performance,​ Runtime Mechanism to Partition Shared CachesMICRO 2006.}} 
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&P Chapter 4 (The von Neumann Model)]]+  * {{suh_new_memory_monitoring_scheme_for_memory_aware_scheduling_and_partitioning.pdf|Suh et alA New Memory Monitoring Scheme for Memory-Aware Scheduling and PartitioningHPCA 2002.}}
  
-===== Lecture ​(1/17 Fri.) =====+===== Lecture ​20 (3/4 Wed.) =====
 **Required:​** **Required:​**
-  * Note that you should familiarize yourself with these manualsPlease briefly skim through these manuals as you will probably need to refer to them while working on labs and homework +  * Section 5.4 in P&H 
-  ARM Architecture Reference Manual +**Mentioned During Lecture:** 
-    [[https://​www.scss.tcd.ie/​~waldroj/​3d1/​arm_arm.pdf|Manual (5MB)]] +  Section 8.8 in Hamacher et al
-  ​* ARM Architecture Instruction Quick Reference +  * {{megiddo.pdf|Megiddo and Modha, “ARC: A Self-Tuning,​ Low Overhead Replacement Cache,” FAST 2003.}}
-    ​* {{arm-instructionset.pdf|Quick Ref (.5MB)}} +
-  * Intel® 64 and IA-32 Architectures Software Developer Manual (2013) +
-    * [[http://​download.intel.com/​products/​processor/​manual/​325462.pdf|(15MB) Combined Volumes 1-3]]3+
  
-**Mentioned during lecture:** +===== Lecture 21 (3/23 Mon.) ===== 
-  * P&H Chapter 4, Sections 4.1-4.4+**Required:** 
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&P Appendix C (The Microarchitecture of the LC-3bBasic Machine)]] +  * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}} 
-  * P&P Chapter 5 (The LC3+  * {{2012_isca_salp.pdf| Kim et al.“A Case for Subarray-Level Parallelism (SALPin DRAM,” ISCA 2012. (Sections 1 and 2)}} 
-  * {{p25-patterson.pdf|Patterson, D. A., & DitzelD. R. (1980). The case for the reduced instruction set computer. SIGARCH Comput. Archit. News, 8(6).}} +  * {{raidr-isca12.pdf| Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (Sections 1 and 2)}} 
-  * [[http://​www.ece.cmu.edu/​~koopman/​stack_computers/​sec3_2.html KoopmanP. (1989Stack Computers: The New Wave.]] +  * {{main-memory-system_kiise15.pdfOnur MutluJustin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers ​(KIISE), 2015.}} 
-  * {{chapter9.pdf|LevyH. (1984). Capability-Based Computer Systems. Chapter 9. The Intel iAPX 432.}} +**Mentioned During Lecture:** 
-  * {{p489-wilner.pdf|WilnerW. T. (1972). Design of the Burroughs B1700. Proceedings of the December 5-71972fall joint computer conferencepart I. }}+  * {{flipping_kim-isca14.pdf| Kim+“Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014.}} 
 +  * {{flash-memory-data-retention_hpca15.pdf| Yu CaiYixin Luo, Erich FHaratsch, Ken Mai, and Onur Mutlu, 
 +"Data Retention in MLC NAND Flash Memory: Characterization,​ Optimization and Recovery,"​  
 +Proceedings of the 21st International Symposium on High-Performance Computer Architecture (HPCA)Bay AreaCAFebruary 2015}} 
 +  * {{coarchitecting-kang.pdf| Kang+, "​Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling"​}}
  
- +===== Lecture ​22 (3/25 Wed.) =====
-===== Lecture ​(1/22 Wed.) ===== +
-**Required** +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&​P Chapter 4 (The von Neumann Model)]] +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixa.pdf|P&​P Appendix A (The LC-3b ISA)]] +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] +
- +
-===== Lecture 5 (1/24 Fri.) ===== +
-**Required** +
-  * None +
- +
-===== Lecture 6 (1/27 Mon.) =====+
 **Required:​** **Required:​**
-  * (CMU WebISO[[http://www.ece.cmu.edu/​~ece447/cmu_only/pp-appendixc.pdf|P&P Appendix C (The Microarchitecture of the LC-3bBasic Machine)]] +  * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}} 
-  * P&H Appendix D (Mapping Control to Hardware+  * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} 
-**Optional:** +   * {{http://users.ece.cmu.edu/​~omutlu/pub/raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} 
-  * {{bestway.pdf|WilkesM. V. (1951). The best way to design an automatic calculating machine. Manchester University Computer Inaugural Conference.}} +  * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers ​(KIISE), 2015.}} 
-**Mentioned during lecture:** +**Mentioned During Lecture:** 
-  * {{bestway.pdf|WilkesM. V. (1951). The best way to design an automatic calculating machine. Manchester University Computer Inaugural Conference.}}+  * {{moscibroda.pdf| Moscibroda and Mutlu“Memory performance attacks: Denial of memory service in multi-core systems,​”USENIX Security 2007}} 
 +  * {{moscibroda2007.pdf| Onur Mutluand Thomas Moscibroda, "​Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors"​ ISCA 2008}} 
 +  {{isca08.pdf| Onur Mutlu and Thomas Moscibroda, "​Parallelism-Aware Batch SchedulingEnhancing both Performance and Fairness of Shared DRAM," ISCA 2008}} 
 +  * {{thread_cluster_mem_sched.pdf| Yoongu KimMichael Papamichael,​ Onur Mutlu, and Mor Harchol-Balter,​ "​Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior"​ 43rd International Symposium on Microarchitecture ​(MICRO), pages 65-76, Atlanta, GA, December 2010}} 
 +  * {{ATLAS.pdf| Yoongu Kim, Dongsu Han, Onur Mutlu, Mor Harchol-Balter,​ "​ATLAS:​ A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers"​}}
  
-===== Lecture ​(1/29 Wed.) =====+===== Lecture ​23 (3/27 Fri.) =====
 **Required:​** **Required:​**
-  * None+  * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013. (Sections 1 and 2)}} 
 +  * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} 
 +   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} 
 +  * {{main-memory-system_kiise15.pdf| Onur Mutlu, Justin Meza, and Lavanya Subramanian,​ "The Main Memory System: Challenges and Opportunities,"​ Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.}}
  
-**Mentioned ​during lecture:** +**Mentioned ​During Lecture:** 
-  * (CMU WebISO) [[http://www.ece.cmu.edu/​~ece447/cmu_only/pp-appendixc.pdf|P&P Appendix ​C (The Microarchitecture ​of the LC-3bBasic Machine)]]+  * {{moscibroda.pdf| Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,​”USENIX Security 2007}} 
 +  * {{http://users.ece.cmu.edu/​~omutlu/pub/memory-channel-partitioning-micro11.pdf|Sai Prashanth Muralidhara,​ Lavanya Subramanian,​ Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda, "​Reducing Memory Interference in Multicore Systems via Application-Awareness"​}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​architecture-aware-distributed-resource-management_vee15.pdf | Wang et al., “A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters,​” VEE 2015.}} 
 +  * {{p362-ebrahimi.pdf|Ebrahimi,​ E., Miftakhutdinov,​ R., Fallin, ​C., Lee, C. J., Joao, J. A., Mutlu, O., & Patt, Y. N. (2011). Parallel application memory scheduling. Proceedings ​of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.}} 
 +  * {{p335-ebrahimi.pdf|EbrahimiE., Lee, C. J., Mutlu, O., & Patt, Y. N. (2010). Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems.}} 
 +  * {{bloom1970.pdf|Bloom,​ “Space/​Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970}} 
 +  * {{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}} 
 +   * {{http://​users.ece.cmu.edu/​~kevincha/​papers/​chang_hpca2014.pdf|Kevin Chang, Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu, "​Improving DRAM Performance by Parallelizing Refreshes with Accesses",​ In HPCA 2014, Orlando, Feb. 2014.}}
  
-===== Lecture ​(1/31 Fri.) =====+===== Lecture ​24 (3/30 Mon.) =====
 **Required:​** **Required:​**
-  * None+  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mutlu_hpca03.pdf | Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,​” HPCA 2003.}} 
 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​TR-HPS-2006-006.pdf|Srinathet al., “Feedback directed prefetching”,​ HPCA 2007.}}
  
-===== Lecture 9 (2/3 Mon.) ===== +**Mentioned During Lecture:** 
-**Required:** +  * {{ramulator.pdf|Kim et al., “Ramulator:​ A Fast and Extensible DRAM Simulator,​” IEEE Computer Architecture Letters 2015.}} 
-  * P&H Sections 4.9-4.11 +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​eaf-cache_pact12.pdf|Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,”PACT 2012.}} 
-  * {{00476078.pdf|Smith, J. E., & SohiGS(1995)The microarchitecture of superscalar processorsProceedings of the IEEE.}} +  * {{http://​hps.ece.utexas.edu/​pub/​TR-HPS-2010-002.pdf|Lee et al., “DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems,​”HPS Technical Report, April 2010.}} 
- +  * {{tldram-lee.pdf| Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013. (Sections 1 and 2)}} 
-**Mentioned during lecture:** +  * {{2012_isca_salp.pdf| Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012. (Sections 1 and 2)}} 
-  * {{p177-allen.pdf|Allen, J. R., Kennedy, K., Porterfield,​ C., & WarrenJ. (1983). Conversion of control dependence to data dependence. Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages.}} +  * {{http://users.ece.cmu.edu/~omutlu/​pub/​rlmc_isca08.pdf|Ipek et al., “Self Optimizing Memory Controllers:​ A Reinforcement Learning Approach,​”ISCA 2008}} 
-  * {{24400043.pdf|Kim, H., MutluO., Stark, J., & Patt, Y. N. (2005). Wish BranchesCombining Conditional Branching and Predication for Adaptive Predicated ExecutionProceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.}} +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mutlu_ieee_micro06.pdf|Mutlu et al., "​Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,"​ ISCA 2005, IEEE Micro Top Picks 2006.}} 
-  * {{thornton_-_1964_-_parallel_operation_in_the_control_data_6600.pdf|Thornton,​ JE(1964)Parallel Operation in the Control Data 6600Proceedings of the Fall Joint Computer Conference.}} +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mutlu_micro05.pdf|Mutlu et al., "​Address-Value Delta (AVDPrediction," MICRO 2005.}} 
-  * {{smith78_hep.pdf|Smith, B. J. (1978). A pipelinedshared resource MIMD computer. International Conference on Parallel Processing.}} +  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​armstrong_micro04.pdf|Armstrong et al., "Wrong Path Events," MICRO 2004.}}
-  * {{p16-pettis.pdf|Pettis, K., & HansenR. C. (1990). Profile guided code positioning. Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation.}} +
- +
-===== Lecture 10 (2/5 Wed.) =====+
  
 +===== Lecture 25 (4/1 Wed.) =====
 **Required:​** **Required:​**
-  * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|McfarlingS. (1993). Combining branch predictors. WRL Technical Note TN-36.}} +  * {{main-memory-system_kiise15.pdf| Onur MutluJustin Mezaand Lavanya Subramanian,​ "The Main Memory SystemChallenges ​and Opportunities," Invited Article in Communications ​of the Korean Institute ​of Information Scientists ​and Engineers ​(KIISE), 2015.}}
-  * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|KesslerR. E. (1999). ​The Alpha 21264 Microprocessor. IEEE Micro.}} +
-**Mentioned during lecture:** +
-  * {{p300-ball.pdf|Ball,​ T., & Larus, J. R. (1993). Branch prediction for free. Proceedings of the ACM SIGPLAN 1993 conference on Programming language design ​and implementation.}} +
-  * {{p135-smith.pdf|SmithJ. E. (1981). A study of branch prediction strategies. Proceedings ​of the 8th annual symposium on Computer Architecture.}} +
-  * {{yeh_patt_-_1991_-_two-level_adaptive_training_branch_prediction.pdf|Yeh,​ T.-Y., & Patt, Y. N. (1991). Two-level adaptive training branch prediction. Proceedings ​of the 24th annual international symposium on Microarchitecture.}} +
-  * {{p22-chang.pdf|Chang,​ P.-Y., Hao, E., Yeh, T.-Y., & Patt, Y. (1994). Branch classification:​ a new mechanism for improving branch predictor performance. Proceedings of the 27th annual international symposium on Microarchitecture.}} +
-  * {{hpca01.pdf|Daniel A. Jimenez ​and Calvin Lin. 2001. Dynamic Branch Prediction with Perceptrons. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture ​(HPCA '01)}} +
-  * {{Riseman.1972.TC.pdf|E. M. Riseman and C. C. Foster. 1972. The Inhibition of Potential Parallelism by Conditional Jumps. IEEE Trans. Comput. 21, 12 (December 1972)}}+
  
-===== Lecture 11 (2/12 Wed.) ===== +**Mentioned During Lecture:** 
-** Required ​** +  * {{jouppi1990.pdf|Jouppi,​ “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990.}} 
-  * None+  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​TR-HPS-2006-006.pdf|Srinathet al., “Feedback directed prefetching”,​ HPCA 2007.}}
  
-** Mentioned during the lecture ​** +===== Lecture 26 (4/3 Fri.) ===== 
-  * {{p274-chang.pdf|Po-Yung Chang, Eric Hao, and Yale N. Patt. 1997. Target prediction for indirect jumps. ISCA'​97.}} +** Required: ​** 
-  * {{kim_isca07.pdf|Hyesoon Kim, José A. Joao, Onur Mutlu, ​Chang Joo LeeYale N. Patt, and Robert Cohn. 2007. VPC prediction: reducing ​the cost of indirect branches via hardware-based dynamic devirtualizationISCA'​07}}+  * {{main-memory-system_kiise15.pdf| Onur Mutlu, ​Justin Mezaand Lavanya Subramanian"The Main Memory System: Challenges ​and Opportunities,"​ Invited Article in Communications of the Korean Institute ​of Information Scientists and Engineers (KIISE), 2015.}}
  
-===== Lecture 12 (2/14 Fri.) ===== +** Mentioned during lecture: ** 
-** Required *+  * {{raidr-isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} 
-  * P&H Sections 4.9-4.11 +  {{2012_isca_salp.pdf|Kim et al., “A Case for Exploiting Subarray-Level Parallelism in DRAM,” ISCA 2012.}} 
-  * {{00476078.pdf|SmithJE., & SohiGS(1995). The microarchitecture ​of superscalar processorsProceedings ​of the IEEE.}} +  ​{{TLDRAM-Lee.pdf|Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,​” HPCA 2013.}} 
-  * {{00004607.pdf|SmithJE., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processorsComputers, IEEE Transactions ​on.}}+  ​{{p60-liu.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}} 
 +  * {{rowclone_micro13.pdf|Seshadri et al., “RowClone:​ Fast and Efficient In-DRAM Copy and Initialization of Bulk Data,” MICRO 2013.}} 
 +  * {{LCP.pdf|Pekhimenko et al.“Linearly Compressed Pages: A Main Memory Compression Framework,​” MICRO 2013.}} 
 +  * {{|Chang et al., “Improving DRAM Performance by Parallelizing Refreshes with Accesses,” HPCA 2014.}} 
 +  * {{error-mitigation-for-intermittent-dram-failures_sigmetrics14.pdf|Khan et al., “The Efficacy ​of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study,” SIGMETRICS 2014.}} 
 +  * {{luo_dsn14.pdf|Luo et al., “Characterizing Application Memory Error Vulnerability to Optimize Data Center Cost,” DSN 2014.}} 
 +  * {{yoongu2014.pdf|Kim et al., “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014.}} 
 +  * {{meza_cal12.pdf|Meza et al., “Enabling Efficient and Scalable Hybrid Memories,​” ​IEEE Comp. Arch. Letters 2012.}} 
 +  * {{rowbuffer-aware-caching_iccd12.pdf|Yoon et al.“Row Buffer Locality Aware Caching Policies for Hybrid Memories,​” ICCD 2012.}} 
 +  * {{sttram_ispass13.pdf|Kultursay ​ et al., “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013. }} 
 +  * {{meza_weed13.pdf|Meza ​ et al., “Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.}} 
 +  * {{ISCA09.pdf|Lee ​ et al. “Architecting Phase Change Memory as a Scalable DRAM Alternative,​” ISCA 2009.}} 
 +  * Meza+, “Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field,” DSN 2015. 
 +  * Qureshi+, “AVATAR: A Variable-Retention-Time ​(VRTAware Refresh for DRAM Systems,” DSN 2015. 
 +  * {{ramulator.pdf|Kim et al., “Ramulator:​ A Fast and Extensible DRAM Simulator,​” ​IEEE Computer Architecture Letters 2015.}} 
 +  * {{dirty-block-index_isca14.pdf|Seshadri+,​ “The Dirty-Block Index,” ISCA 2014.}} 
 +  * {{bdi-compression_pact12.pdf|Pekhimenko+,​ “Base-Delta-Immediate Compression:​ Practical Data Compression for On-Chip Caches,” PACT 2012.}} 
 +  * {{eaf-cache_pact12.pdf|Seshadri+,​ “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,​” PACT 2012.}} 
 +  * {{zhao-micro2014.pdf|Zhao+,​ “FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems,” MICRO 2014.}} 
 +  * {{a40-yoon.pdf|Yoon,​ Meza+, “Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories,​” ACM TACO 2014.}} 
 +  * {{compression-aware-cache-management_hpca15.pdf|Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry, Exploiting Compressed Block Size as an Indicator of Future Reuse, HPCA 2015.}} 
 +  * {{06974684.pdf|Lu+,​ “Loose Ordering Consistency for Persistent Memory,” ICCD 2014.}} 
 +  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu,​ O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium ​on High-Performance Computer Architecture.}} 
 +  * {{cooksey.pdf|Cooksey et al., “A stateless, content-directed data prefetching mechanism,​” ASPLOS 2002.}} 
 +  * {{zilles-2001.pdf|Zilles and Sohi, “Execution-based Prediction Using Speculative Slices”, ISCA 2001.}} 
 +  * {{zilles-2000.pdf|Zilles and Sohi, ”Understanding the backward slices of performance degrading instructions,​” ISCA 2000.}} 
 +  * {{http://​www.amazon.com/​Inside-AS-400-Second-Edition/​dp/​1882419669|Frank Soltis,"​Inside the AS/​400"​}}
  
-===== Lecture ​13 (2/17 Mon.) ===== +===== Lecture ​27 (4/Mon.) ===== 
-** Required ** +** Required** 
-  * none+  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl,​ G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}} 
 +  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport,​ L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}} 
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​culler-mesi.pdf|C&​S,​ Chapters 5.1 & 5.3]] 
 +  * P&H, Chapter 5.8 
 +** Recommended:​ ** 
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​hill_309_314.pdf|Hill,​ Jouppi, Sohi. "​Multiprocessors and Multicomputers,"​ pp. 551-560 in Readings in Computer Architecture.]] 
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​hill_551_560.pdf|Hill,​ Jouppi, Sohi. "​Dataflow and Multithreading,"​ pp. 309-314 in Readings in Computer Architecture.]] 
 +  * {{01447203.pdf|Flynn,​ M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}} 
 +  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} 
 +** Mentioned during lecture: ** 
 +  * {{horner-1819.pdf|Horner (1819). A new method of solving numerical equations of all orders, by continuous approximation. Philosophical Transactions of the Royal Society}}
  
-===== Lecture ​14 (2/19 Wed.) ===== +===== Lecture ​28 (4/Wed.) ===== 
-** Required ** +** Required** 
-  * {{p18-hwu.pdf|HwuWW., & PattYN. (1987). Checkpoint repair ​for out-of-order execution machines. Proceedings of the 14th annual international symposium on Computer architecture.}} +  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|LamportL(1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}} 
-  * {{00476078.pdf|SmithJE.SohiGS. (1995). The microarchitecture ​of superscalar processorsProceedings ​of the IEEE.}} +  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & PatelJH. (1984). A low-overhead coherence solution ​for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} 
-  * {{00004607.pdf|Smith, JE., & Pleszkun, ​A. R. (1988). Implementing precise interrupts in pipelined processorsComputers, IEEE Transactions on.}}+  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​culler-mesi.pdf|C&SChapters 5.1 & 5.3]] 
 +  * P&HChapter 5.
 +** Recommended:​ ** 
 +  * {{10.1.1.17.8112.pdf|Gharachorloo et al. (1990). Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors.}} 
 +  * {{10.1.1.89.3693.pdf|Gharachorloo et al. (1991). Two Techniques to Enhance the Performance ​of Memory Consistency Models.}} 
 +  * {{isca07_bulksc.pdf|Ceze et al. (2007). BulkSC: Bulk Enforcement ​of Sequential Consistency.}} 
 +  * {{censier.pdf|Censier et al(1978). A new solution to coherence problems in multicache systems.}} 
 +  * {{goodman-snoopyprotocol.pdf|Goodman ​(1983). Using cache memory to reduce processor-memory traffic.}} 
 +  * {{isca123.pdf|Laudon et al. (1997). The SGI Origin: a ccNUMA highly scalable server.}} 
 +  * {{isca03_token_coherence.pdf|Martin et al. (2003). Token coherence: decoupling performance and correctness.}} 
 +  * {{p73-baer.pdf|Baer et al. (1988). On the inclusion properties for multi-level cache hierarchies.}} 
 +** Mentioned during lecture: ** 
 +  * (HTML) [[http://​www.cs.utexas.edu/​users/​EWD/​transcriptions/​EWD01xx/​EWD123.html|Dijkstra (1965) Cooperating Sequential Processes.]]
  
 +===== Lecture 29 (4/10 Fri.) =====
 +** Required: **
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​culler-mesi.pdf|C&​S,​ Chapters 5.1 & 5.3]]
 +  * P&H, Chapter 5.8
 +  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}}
 +** Recommended:​ **
 +  * {{censier.pdf|Censier et al. (1978). A new solution to coherence problems in multicache systems.}}
 +  * {{goodman-snoopyprotocol.pdf|Goodman (1983). Using cache memory to reduce processor-memory traffic.}}
 +  * {{isca123.pdf|Laudon et al. (1997). The SGI Origin: a ccNUMA highly scalable server.}}
 +  * {{isca03_token_coherence.pdf|Martin et al. (2003). Token coherence: decoupling performance and correctness.}}
 +  * {{p73-baer.pdf|Baer et al. (1988). On the inclusion properties for multi-level cache hierarchies.}}
  
-===== Lecture ​15 (2/21 Fri.) ===== +===== Lecture ​30 (4/13 Mon.) ===== 
-** Required ** +** Required** 
-  * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA TeslaA Unified Graphics ​and Computing Architecture. Micro, IEEE.}} +  * {{rowclone_micro13.pdf|Seshadri et al., “RowCloneFast and Efficient In-DRAM Copy and Initialization of Bulk Data,” MICRO 2013.}}
-  * {{p50-fatahalian.pdf|FatahalianK., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}}+
  
-**Mentioned during lecture:** 
-  * {{30470407.pdf|Fung,​ W. W. L., Sham, I., Yuan, G., & Aamodt, T. M. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.}} 
-  * {{p253-suleman.pdf |Suleman, M. A., Mutlu, O., Qureshi, M. K., & Patt, Y. N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. Proceedings of the 14th international conference on Architectural support for programming languages and operating systems.}} 
-  * {{01447203.pdf|Flynn,​ M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}} 
-  * {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher,​ J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}} 
-  * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith,​ J. E. (1982). Decoupled access/​execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}} 
-  * {{p289-smith.pdf|Smith,​ J. E. (1984). Decoupled access/​execute computer architectures. ACM Trans. Comput. Syst.}} 
-  * {{p199-smith.pdf|Smith,​ J. E., Dermer, G. E., Vanderwarn, B. D., Klinger, S. D., & Rozewski, C. M. (1987). The ZS-1 central processor. Proceedings of the second international conference on Architectual support for programming languages and operating systems.}} 
-  * {{00030730.pdf|Smith,​ J. E. (1989). Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer.}} 
-  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung,​ H. T. (1982). Why Systolic Architectures?​ IEEE Computer.}} 
-  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone,​ M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}} 
-  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone,​ M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture,​ implementation,​ and performance. IEEE Transactions on Computers.}} 
- 
-===== Lecture 16 (2/23 Mon.) ===== 
-**Mentioned during lecture:** 
-  * {{01675827.pdf|Fisher,​ J. A. (1981). Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Comput.}} 
-  * {{:​mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013}} 
-  * {{2fbf01205185.pdf|Hwu,​ W.-M. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., et al. (1993). The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput.}} 
-  * {{p45-mahlke.pdf|Mahlke,​ S. A., Lin, D. C., Chen, W. Y., Hank, R. E., & Bringmann, R. A. (1992). Effective compiler support for predicated execution using the hyperblock. Proceedings of the 25th annual international symposium on Microarchitecture.}} 
-  * {{melvin_patt_-_1995_-_enhancing_instruction_scheduling_with_a_block-structured_isa.pdf|Melvin,​ S., & Patt, Y. (1995). Enhancing instruction scheduling with a block-structured ISA. Int. J. Parallel Program.}} 
-  * {{hao_et_al._-_1996_-_increasing_the_instruction_fetch_rate_via_block-structured_instruction_set_architectures.pdf|Hao,​ E., Chang, P.-Y., Evers, M., & Patt, Y. N. (1996). Increasing the instruction fetch rate via block-structured instruction set architectures. Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture.}} 
-  * {{00877947.pdf|Huck,​ J., Morris, D., Ross, J., Knies, A., Mulder, H., & Zahir, R. (2000). Introducing the IA-64 architecture. IEEE Micro.}} 
readings.1424744396.txt.gz · Last modified: 2015/02/24 02:19 by kevincha