User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
readings [2015/02/24 02:19]
kevincha [Lecture 16 (2/23 Fri.)]
readings [2015/03/02 22:10]
jeremie
Line 183: Line 183:
   * {{large-gpu-warps_micro11|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov,​ Onur Mutlu, and Yale N. Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling.MICRO-44}}   * {{large-gpu-warps_micro11|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov,​ Onur Mutlu, and Yale N. Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling.MICRO-44}}
  
-====== Readings ====== +===== Lecture ​16 (2/23 Mon.) =====
- +
-  * **P&P** stands for Patt & Patel'​s //​Introduction to Computing Systems: From Bits and Gates to C and Beyond// +
-    * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap1.pdf|P&​P Chapter 1 (Fundamentals)]] +
-    * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&​P Chapter 4 (The von Neumann Model)]] +
-    * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixa.pdf|P&​P Appendix A (The LC-3b ISA)]] +
-    * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] +
-  * **P&H** stands for Patterson & Hennessy'​s //Computer Organization and Design: The Hardware/​Software Interface//​ +
- +
-===== Lecture ​(1/13 Mon.) ===== +
-**Required:​** +
-  * None +
 **Mentioned during lecture:** **Mentioned during lecture:**
 +  * {{:​mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013}}
 +  * [[http://​users.ece.cmu.edu/​~omutlu/​pub/​mph_usenix_security07.pdf|Moscibroda,​ T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.]]
 +  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung,​ H. T. (1982). Why Systolic Architectures?​ IEEE Computer.}}
 +  * {{01675827.pdf|Fisher,​ J. A. (1981). Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Comput.}}
 +  * {{2fbf01205185.pdf|Hwu,​ W.-M. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., et al. (1993). The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput.}}
 +  * {{p45-mahlke.pdf|Mahlke,​ S. A., Lin, D. C., Chen, W. Y., Hank, R. E., & Bringmann, R. A. (1992). Effective compiler support for predicated execution using the hyperblock. Proceedings of the 25th annual international symposium on Microarchitecture.}}
 +  * {{melvin_patt_-_1995_-_enhancing_instruction_scheduling_with_a_block-structured_isa.pdf|Melvin,​ S., & Patt, Y. (1995). Enhancing instruction scheduling with a block-structured ISA. Int. J. Parallel Program.}}
 +  * {{hao_et_al._-_1996_-_increasing_the_instruction_fetch_rate_via_block-structured_instruction_set_architectures.pdf|Hao,​ E., Chang, P.-Y., Evers, M., & Patt, Y. N. (1996). Increasing the instruction fetch rate via block-structured instruction set architectures. Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture.}}
 +  * {{00877947.pdf|Huck,​ J., Morris, D., Ross, J., Knies, A., Mulder, H., & Zahir, R. (2000). Introducing the IA-64 architecture. IEEE Micro.}}
 +  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone,​ M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}}
 +  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone,​ M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture,​ implementation,​ and performance. IEEE Transactions on Computers.}}
 +  *  {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher,​ J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}}
 +  * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith,​ J. E. (1982). Decoupled access/​execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}}
 +  * {{p289-smith.pdf|Smith,​ J. E. (1984). Decoupled access/​execute computer architectures. ACM Trans. Comput. Syst.}}
 +  * {{:​ilp_history_overview_perspective.pdf|Rau and Fisher, “Instruction-level parallel processing:​ history,​ overview, and perspective,​” Journal of Supercomputing,​ 1993.}}
 +  * {{:​ieee_proceedings_2001_-_compiler_techniques.pdf|Faraboschi et al., “Instruction Scheduling for Instruction Level Parallel Processors,​” Proc. IEEE, Nov. 2001.
 +}}
  
-  * {{bstj29-2-147.pdf|Hamming,​ R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2).}} +===== Lecture ​17 (2/25 Wed.) =====
-  * {{youandyourresearch.pdf|Hamming,​ R. W. (1986). You and Your Research. Transcription of the Bell Communications Research Colloquium Seminar.}} +
-    * [[http://​www.youtube.com/​watch?​v=a1zDuOPkMSw|youtube]] +
-  * {{05392210.pdf|Amdahl,​ G. M., Blaauw, G. A., & Brooks, F. P. (1964). Architecture of the IBM system/360. IBM J. Res. Dev., 8(2).}} +
-  * {{p128-rixner.pdf|Rixner,​ S., Dally, W. J., Kapasi, U. J., Mattson, P., & Owens, J. D. (2000). Memory access scheduling. Proceedings of the 27th annual international symposium on Computer architecture.}} +
-  * {{us5630096.pdf|William K. Zuravleff, & Robinson, T. (1997). Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order.}} +
-  * {{00964437.pdf|Patt,​ Y. (2001). Requirements,​ bottlenecks,​ and good fortune: agents for microprocessor evolution. Proceedings of the IEEE.}} +
-  * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mph_usenix_security07.pdf|Moscibroda,​ T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.}} +
-   * {{http://​research.microsoft.com/​pubs/​79625/​MICRO2007.pdf|Onur Mutlu and Thomas Moscibroda, "​Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors",​ MICRO 2007. }} +
-   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-channel-partitioning-micro11.pdf|Sai Prashanth Muralidhara,​ Lavanya Subramanian,​ Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda, "​Reducing Memory Interference in Multicore Systems via Application-Aware  +
-   * Memory Channel Partitioning",​ MICRO 2011.}} +
-   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} +
-   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-scaling_memcon13.pdf|Onur Mutlu, "​Memory Scaling: A Systems Architecture Perspective"​ Technical talk at MemCon 2013 (MEMCON), Santa Clara, CA, August 2013.}} +
- +
-===== Lecture 2 (1/15 Wed.) =====+
 **Required:​** **Required:​**
-  * {{00964437.pdf|PattY(2001)Requirementsbottlenecksand good fortune: agents for microprocessor evolutionProceedings of the IEEE.}} +  * {{00877947.pdf|HuckJ., Morris, D., RossJ., Knies, A., Mulder, H., & ZahirR. (2000). Introducing the IA-64 architectureIEEE Micro.}} 
-  * {{moscibroda.pdf|MoscibrodaT., & MutluO. (2007). Memory performance attacks: denial of memory service in multi-core systemsProceedings of 16th USENIX Security Symposium.}} +  * P&H Chapters 5.1-5.3 ​(cache chapters) 
-  * (CMU WebISO[[http://​www.ece.cmu.edu/~ece447/​cmu_only/​PP_Chap1.pdf|P&P Chapter 1 (Fundamentals)]] +  * Hamacher et alChapters 8.1-8.7 (cache/memory chapters) 
-  * P&H Chapters 1 and 2 (IntroAbstractionsISAMIPS)+  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes, M. V. (1965). Slave Memories and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}} 
 +  * {{:​liptay68.pdf|Liptay“Structural aspects of the System/360 Model 85 II: the cache,” IBM Systems Journal1968. 
 +}}
  
-**Mentioned during lecture:​** +===== Lecture ​18 (2/27 Fri.) =====
-  * {{gordon_moore_1965_article.pdf|Moore,​ G. E. (1965). Cramming More Components onto Integrated Circuits. Electronics,​ 38(8).}} +
-  * {{bab6286.0001.001.pdf|Burks,​ A. W., Goldstine, H. H., & Neumann, J. von. (1946). Preliminary discussion of the logical design of an electronic computing instrument.}} +
-  * {{p126-dennis.pdf|Dennis,​ J. B., & Misunas, D. P. (1975). A preliminary architecture for a basic data-flow processor. Proceedings of the 2nd annual symposium on Computer architecture.}} +
-  * {{p34-gurd.pdf|Gurd,​ J. R., Kirkham, C. C., & Watson, I. (1985). The Manchester prototype dataflow computer. Commun. ACM, 28(1).}} +
-  * Kuhn, T. S. (1962). The Structure of Scientific Revolutions. +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&​P Chapter 4 (The von Neumann Model)]] +
- +
-===== Lecture ​(1/17 Fri.) =====+
 **Required:​** **Required:​**
-  * Note that you should familiarize yourself with these manualsPlease briefly skim through these manuals as you will probably need to refer to them while working on labs and homework +  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes, MV(1964)Slave Memories and Dynamic Storage AllocationIEEE Transactions on Electronic Computers.}} 
-  * ARM Architecture Reference Manual +  * {{A_Case_For_MLP_Aware_Cache_Replacement.pdf|nQureshi et al., “A Case for MLP-Aware Cache Replacement,​“ ISCA 2006.}} 
-    * [[https://​www.scss.tcd.ie/​~waldroj/​3d1/​arm_arm.pdf|Manual (5MB)]] +  * P&H Chapters 5.1-5.3 (cache chapters
-  ​* ARM Architecture Instruction Quick Reference +  Hamacher et alChapters 8.1-8.(cache/​memory chapters)
-    ​* {{arm-instructionset.pdf|Quick Ref (.5MB)}} +
-  * Intel® 64 and IA-32 Architectures Software Developer Manual ​(2013+
-    [[http://​download.intel.com/​products/​processor/​manual/​325462.pdf|(15MBCombined Volumes 1-3]]3+
  
-**Mentioned ​during lecture:** +**Mentioned ​During Lecture:** 
-  * P&H Chapter 4, Sections 4.1-4.4. +  * {{A_Study_of_replacement_algorithms_for_a_virtual-storage_computer.pdf|qBeladystudy of replacement algorithms ​for a virtual-storage ​computer,” IBM Systems ​Journal1966.}}
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] +
-  * P&P Chapter 5 (The LC3) +
-  * {{p25-patterson.pdf|PattersonD. A., & Ditzel, D. R. (1980). The case for the reduced instruction set computer. SIGARCH Comput. Archit. News8(6).}} +
-  * [[http://​www.ece.cmu.edu/​~koopman/​stack_computers/​sec3_2.html | Koopman, P. (1989) Stack Computers: The New Wave.]] +
-  * {{chapter9.pdf|Levy,​ H. (1984). Capability-Based Computer ​Systems. Chapter 9. The Intel iAPX 432.}} +
-  * {{p489-wilner.pdf|Wilner,​ W. T. (1972). Design of the Burroughs B1700. Proceedings of the December 5-7, 1972, fall joint computer conferencepart I. }}+
  
- +===== Lecture ​19 (3/Mon.) =====
-===== Lecture ​(1/22 Wed.) ===== +
-**Required** +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&​P Chapter 4 (The von Neumann Model)]] +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixa.pdf|P&​P Appendix A (The LC-3b ISA)]] +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] +
- +
-===== Lecture 5 (1/24 Fri.) ===== +
-**Required** +
-  * None +
- +
-===== Lecture 6 (1/27 Mon.) =====+
 **Required:​** **Required:​**
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] +  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes,​ M. V. (1964). Slave Memories and Dynamic Storage AllocationIEEE Transactions on Electronic Computers.}} 
-  * P&H Appendix D (Mapping Control to Hardware) +  * {{A_Case_For_MLP_Aware_Cache_Replacement.pdf|Qureshi et al.“A Case for MLP-Aware Cache Replacement,​“ ISCA 2006.}} 
-**Optional:​** +  * P&H Chapters 5.1-5.3 (cache chapters) 
-  * {{bestway.pdf|Wilkes,​ M. V. (1951). The best way to design an automatic calculating machineManchester University Computer Inaugural Conference.}} +  * Hamacher et alChapters 8.1-8.7 (cache/​memory chapters)
-**Mentioned during lecture:** +
-  * {{bestway.pdf|WilkesMV. (1951). The best way to design an automatic calculating machineManchester University Computer Inaugural Conference.}}+
  
-===== Lecture 7 (1/29 Wed.) ===== +**Mentioned ​During ​Lecture:​** 
-**Required:​** +  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|JouppiNP. (1990). Improving direct-mapped cache performance by the addition ​of a small fully-associative cache and prefetch buffers. Proceedings of the 17th annual ​international ​symposium on Computer Architecture.}} 
-  * None +  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu, ​O., Stark, J., WilkersonC., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows ​for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance ​Computer Architecture.}} 
- +  * {{seznec_a_case_for_two_way_skewed_associative_caches.pdf|SeznecA Case for Two-Way Skewed-Associative CachesISCA 1993.}} 
-**Mentioned ​during lecture:​** +  * {{seznec_a_case_for_two_way_skewed_associative_caches.pdf|KroftLockup-Free Instruction Fetch/Prefetch Cache OrganizationISCA 1981.}} 
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] +  * {{qureshi_utility_based_cache_partitioning.pdf|Qureshi ​and PattUtility-Based Cache Partitioning: A Low-OverheadHigh-PerformanceRuntime Mechanism to Partition Shared CachesMICRO 2006.}} 
- +  * {{suh_new_memory_monitoring_scheme_for_memory_aware_scheduling_and_partitioning.pdf|Suh et al. A New Memory Monitoring Scheme for Memory-Aware Scheduling and PartitioningHPCA 2002.}}
-===== Lecture ​8 (1/31 Fri.) ===== +
-**Required:** +
-  * None +
- +
-===== Lecture 9 (2/3 Mon.) ===== +
-**Required:​** +
-  * P&H Sections 4.9-4.11 +
-  * {{00476078.pdf|Smith,​ J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} +
- +
-**Mentioned during lecture:​** +
-  * {{p177-allen.pdf|Allen,​ J. R., Kennedy, K., Porterfield,​ C., & Warren, J. (1983). Conversion of control dependence to data dependence. Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages.}} +
-  * {{24400043.pdf|Kim,​ H., Mutlu, O., Stark, J., & Patt, Y. N. (2005). Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.}} +
-  * {{thornton_-_1964_-_parallel_operation_in_the_control_data_6600.pdf|ThorntonJE. (1964). Parallel Operation in the Control Data 6600. Proceedings ​of the Fall Joint Computer Conference.}} +
-  * {{smith78_hep.pdf|Smith,​ B. J. (1978). A pipelined, shared resource MIMD computer. International Conference on Parallel Processing.}} +
-  * {{p16-pettis.pdf|Pettis,​ K., & Hansen, R. C. (1990). Profile guided code positioning. Proceedings of the ACM SIGPLAN 1990 conference on Programming language design ​and implementation.}} +
- +
-===== Lecture 10 (2/5 Wed.) ===== +
- +
-**Required:​** +
-  * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|Mcfarling,​ S. (1993). Combining branch predictors. WRL Technical Note TN-36.}} +
-  * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler,​ R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}} +
-**Mentioned during lecture:​** +
-  * {{p300-ball.pdf|Ball,​ T., & Larus, J. R. (1993). Branch prediction for free. Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation.}} +
-  * {{p135-smith.pdf|Smith,​ J. E. (1981). A study of branch prediction strategies. Proceedings of the 8th annual symposium on Computer Architecture.}} +
-  * {{yeh_patt_-_1991_-_two-level_adaptive_training_branch_prediction.pdf|Yeh,​ T.-Y., & Patt, Y. N. (1991). Two-level adaptive training branch prediction. Proceedings of the 24th annual international symposium on Microarchitecture.}} +
-  * {{p22-chang.pdf|Chang,​ P.-Y., Hao, E., Yeh, T.-Y., & Patt, Y. (1994). Branch classification:​ a new mechanism for improving branch predictor performance. Proceedings ​of the 27th annual international symposium on Microarchitecture.}} +
-  * {{hpca01.pdf|Daniel A. Jimenez and Calvin Lin. 2001. Dynamic Branch Prediction with Perceptrons. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA '​01)}} +
-  * {{Riseman.1972.TC.pdf|E. M. Riseman and C. C. Foster. 1972. The Inhibition of Potential Parallelism by Conditional Jumps. IEEE Trans. Comput. 21, 12 (December 1972)}} +
- +
-===== Lecture 11 (2/12 Wed.) ===== +
-** Required ** +
-  * None +
- +
-** Mentioned during the lecture ** +
-  * {{p274-chang.pdf|Po-Yung Chang, Eric Hao, and Yale N. Patt. 1997. Target prediction for indirect jumps. ISCA'​97.}} +
-  * {{kim_isca07.pdf|Hyesoon Kim, José A. Joao, Onur Mutlu, ​Chang Joo Lee, Yale NPattand Robert Cohn. 2007. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. ISCA'​07}} +
- +
-===== Lecture 12 (2/14 Fri.) ===== +
-** Required ** +
-  * P&H Sections 4.9-4.11 +
-  * {{00476078.pdf|Smith, J. E., & SohiG. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} +
-  * {{00004607.pdf|Smith,​ J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}} +
- +
-===== Lecture 13 (2/17 Mon.) ===== +
-** Required ** +
-  * none +
- +
-===== Lecture 14 (2/19 Wed.) ===== +
-** Required ** +
-  * {{p18-hwu.pdf|Hwu,​ W. W., & Patt, Y. N. (1987). Checkpoint repair ​for out-of-order execution machines. Proceedings of the 14th annual international symposium on Computer architecture.}} +
-  * {{00476078.pdf|Smith,​ J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} +
-  * {{00004607.pdf|Smith,​ J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}} +
- +
- +
-===== Lecture 15 (2/21 Fri.) ===== +
-** Required ** +
-  * {{04523358.pdf|Lindholm,​ E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}} +
-  * {{p50-fatahalian.pdf|Fatahalian,​ K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}} +
- +
-**Mentioned during lecture:​** +
-  * {{30470407.pdf|Fung,​ W. W. L., Sham, I., Yuan, G., & Aamodt, T. M. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. Proceedings of the 40th Annual IEEE/​ACM ​International Symposium on Microarchitecture.}} +
-  * {{p253-suleman.pdf |Suleman, M. A., Mutlu, O., Qureshi, M. K., & Patt, Y. N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. Proceedings of the 14th international conference on Architectural support for programming languages and operating systems.}} +
-  * {{01447203.pdf|Flynn,​ M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}} +
-  * {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher,​ J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}} +
-  * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith,​ J. E. (1982). Decoupled access/​execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}} +
-  * {{p289-smith.pdf|Smith, JE. (1984). Decoupled access/​execute computer architectures. ACM Trans. Comput. Syst.}} +
-  * {{p199-smith.pdf|Smith,​ J. E., Dermer, G. E., Vanderwarn, B. D., Klinger, S. D., & Rozewski, C. M. (1987). The ZS-1 central processor. Proceedings of the second international conference on Architectual support ​for programming languages and operating systems.}} +
-  * {{00030730.pdf|Smith,​ J. E. (1989). Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer.}} +
-  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures?​ IEEE Computer.}} +
-  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}} +
-  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone,​ M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture,​ implementation,​ and performance. IEEE Transactions on Computers.}} +
- +
-===== Lecture 16 (2/23 Mon.) ===== +
-**Mentioned during lecture:​** +
-  * {{01675827.pdf|Fisher,​ J. A. (1981). Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Comput.}} +
-  * {{:​mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability ​and Improving Fairness in Shared Main Memory Systems,” HPCA 2013}} +
-  * {{2fbf01205185.pdf|Hwu,​ W.-M. W., Mahlke, S. A., Chen, W. Y., Chang, P. P., Warter, N. J., Bringmann, R. A., Ouellette, R. G., et al. (1993). The superblockan effective technique for VLIW and superscalar compilation. J. Supercomput.}} +
-  * {{p45-mahlke.pdf|Mahlke,​ S. A., Lin, D. C., Chen, W. Y., Hank, R. E., & Bringmann, R. A. (1992). Effective compiler support for predicated execution using the hyperblock. Proceedings of the 25th annual international symposium on Microarchitecture.}} +
-  * {{melvin_patt_-_1995_-_enhancing_instruction_scheduling_with_a_block-structured_isa.pdf|MelvinS., & Patt, Y. (1995). Enhancing instruction scheduling with a block-structured ISA. Int. J. Parallel Program.}} +
-  * {{hao_et_al._-_1996_-_increasing_the_instruction_fetch_rate_via_block-structured_instruction_set_architectures.pdf|HaoE., Chang, P.-Y., Evers, M., & Patt, Y. N. (1996). Increasing the instruction fetch rate via block-structured instruction set architectures. Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture.}} +
-  * {{00877947.pdf|Huck, J., Morris, D., Ross, J., Knies, ​A., Mulder, H., & Zahir, R. (2000). Introducing the IA-64 architectureIEEE Micro.}}+
readings.txt · Last modified: 2015/04/13 19:31 by kevincha