User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
readings [2014/02/12 17:58]
rachata
readings [2015/02/05 17:09]
rachata
Line 8: Line 8:
   * **P&H** stands for Patterson & Hennessy'​s //Computer Organization and Design: The Hardware/​Software Interface//   * **P&H** stands for Patterson & Hennessy'​s //Computer Organization and Design: The Hardware/​Software Interface//
  
-===== Lecture 1 (1/13 Mon.) =====+====== Guides on how to review papers critically ====== 
 +  * Lecture slides: {{onur-447-s15-how-to-do-the-paper-reviews.pdf | pdf}} {{onur-447-s15-how-to-do-the-paper-reviews.ppt | Slides ppt}} 
 +  * Example reviews on "Main Memory Scaling: Challenges and Solution Directions"​ (link to the paper) 
 +      * {{review-chapter.pdf | Review 1}} 
 +      * {{review-chapter-2.pdf | Review 2}} 
 +  * Example review on "​Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems"​ (link to the paper) 
 +      * {{review-sms.pdf | Review 1}} 
 + 
 + 
 +===== Lecture 1 (1/12 Mon.) =====
 **Required:​** **Required:​**
-  * None+  * For HW1: {{00964437.pdf|Patt,​ Y. (2001). Requirements,​ bottlenecks,​ and good fortune: agents for microprocessor evolution. Proceedings of the IEEE.}}
  
 **Mentioned during lecture:** **Mentioned during lecture:**
- 
   * {{bstj29-2-147.pdf|Hamming,​ R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2).}}   * {{bstj29-2-147.pdf|Hamming,​ R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2).}}
   * {{youandyourresearch.pdf|Hamming,​ R. W. (1986). You and Your Research. Transcription of the Bell Communications Research Colloquium Seminar.}}   * {{youandyourresearch.pdf|Hamming,​ R. W. (1986). You and Your Research. Transcription of the Bell Communications Research Colloquium Seminar.}}
     * [[http://​www.youtube.com/​watch?​v=a1zDuOPkMSw|youtube]]     * [[http://​www.youtube.com/​watch?​v=a1zDuOPkMSw|youtube]]
-  * {{05392210.pdf|Amdahl,​ G. M., Blaauw, G. A., & Brooks, F. P. (1964). Architecture of the IBM system/360. IBM J. Res. Dev., 8(2).}} 
   * {{p128-rixner.pdf|Rixner,​ S., Dally, W. J., Kapasi, U. J., Mattson, P., & Owens, J. D. (2000). Memory access scheduling. Proceedings of the 27th annual international symposium on Computer architecture.}}   * {{p128-rixner.pdf|Rixner,​ S., Dally, W. J., Kapasi, U. J., Mattson, P., & Owens, J. D. (2000). Memory access scheduling. Proceedings of the 27th annual international symposium on Computer architecture.}}
-  * {{us5630096.pdf|William K. Zuravleff, & Robinson, T. (1997). Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order.}} 
-  * {{00964437.pdf|Patt,​ Y. (2001). Requirements,​ bottlenecks,​ and good fortune: agents for microprocessor evolution. Proceedings of the IEEE.}} 
   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mph_usenix_security07.pdf|Moscibroda,​ T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.}}   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​mph_usenix_security07.pdf|Moscibroda,​ T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.}}
    * {{http://​research.microsoft.com/​pubs/​79625/​MICRO2007.pdf|Onur Mutlu and Thomas Moscibroda, "​Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors",​ MICRO 2007. }}    * {{http://​research.microsoft.com/​pubs/​79625/​MICRO2007.pdf|Onur Mutlu and Thomas Moscibroda, "​Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors",​ MICRO 2007. }}
-   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-channel-partitioning-micro11.pdf|Sai Prashanth Muralidhara,​ Lavanya Subramanian,​ Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda, "​Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning",​ MICRO 2011.}}+   * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-channel-partitioning-micro11.pdf|Sai Prashanth Muralidhara,​ Lavanya Subramanian,​ Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda, "​Reducing Memory Interference in Multicore Systems via Application-Aware ​ 
 +   ​* ​Memory Channel Partitioning",​ MICRO 2011.}}
    * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}}    * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}}
    * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-scaling_memcon13.pdf|Onur Mutlu, "​Memory Scaling: A Systems Architecture Perspective"​ Technical talk at MemCon 2013 (MEMCON), Santa Clara, CA, August 2013.}}    * {{http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-scaling_memcon13.pdf|Onur Mutlu, "​Memory Scaling: A Systems Architecture Perspective"​ Technical talk at MemCon 2013 (MEMCON), Santa Clara, CA, August 2013.}}
 +   * {{http://​users.ece.cmu.edu/​~kevincha/​papers/​chang_hpca2014.pdf|Kevin Chang, Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu, "​Improving DRAM Performance by Parallelizing Refreshes with Accesses",​ In HPCA 2014, Orlando, Feb. 2014.}}
 +   * {{http://​users.ece.cmu.edu/​~yoonguk/​papers/​kim-isca14.pdf | Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, Onur Mutlu, "​Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors",​ In ISCA-41, 2014.}}
  
-===== Lecture 2 (1/15 Wed.) =====+===== Lecture 2 (1/14 Wed.) =====
 **Required:​** **Required:​**
   * {{00964437.pdf|Patt,​ Y. (2001). Requirements,​ bottlenecks,​ and good fortune: agents for microprocessor evolution. Proceedings of the IEEE.}}   * {{00964437.pdf|Patt,​ Y. (2001). Requirements,​ bottlenecks,​ and good fortune: agents for microprocessor evolution. Proceedings of the IEEE.}}
Line 33: Line 41:
   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap1.pdf|P&​P Chapter 1 (Fundamentals)]]   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap1.pdf|P&​P Chapter 1 (Fundamentals)]]
   * P&H Chapters 1 and 2 (Intro, Abstractions,​ ISA, MIPS)   * P&H Chapters 1 and 2 (Intro, Abstractions,​ ISA, MIPS)
 +
  
 **Mentioned during lecture:** **Mentioned during lecture:**
-  * {{gordon_moore_1965_article.pdf|Moore,​ G. E. (1965). Cramming More Components onto Integrated Circuits. Electronics,​ 38(8).}} +  
-  * {{bab6286.0001.001.pdf|Burks,​ A. W., Goldstine, H. H., & Neumann, J. von. (1946). Preliminary discussion of the logical design of an electronic computing instrument.}} +===== Lecture 3 (1/16 Fri.) =====
-  * {{p126-dennis.pdf|Dennis,​ J. B., & Misunas, D. P. (1975). A preliminary architecture for a basic data-flow processor. Proceedings of the 2nd annual symposium on Computer architecture.}} +
-  * {{p34-gurd.pdf|Gurd,​ J. R., Kirkham, C. C., & Watson, I. (1985). The Manchester prototype dataflow computer. Commun. ACM, 28(1).}} +
-  * Kuhn, T. S. (1962). The Structure of Scientific Revolutions. +
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&​P Chapter 4 (The von Neumann Model)]] +
- +
-===== Lecture 3 (1/17 Fri.) =====+
 **Required:​** **Required:​**
   * Note that you should familiarize yourself with these manuals. Please briefly skim through these manuals as you will probably need to refer to them while working on labs and homework   * Note that you should familiarize yourself with these manuals. Please briefly skim through these manuals as you will probably need to refer to them while working on labs and homework
-  * ARM Architecture Reference Manual +  * MIPS Architecture Reference Manual 
-    * [[https://​www.scss.tcd.ie/​~waldroj/​3d1/​arm_arm.pdf|Manual (5MB)]] +    * {{mips_r4000_users_manual.pdf |Manual ​(the instruction set reference starts on pg.469)}}
-  * ARM Architecture Instruction Quick Reference +
-    * {{arm-instructionset.pdf|Quick Ref (.5MB)}}+
   * Intel® 64 and IA-32 Architectures Software Developer Manual (2013)   * Intel® 64 and IA-32 Architectures Software Developer Manual (2013)
     * [[http://​download.intel.com/​products/​processor/​manual/​325462.pdf|(15MB) Combined Volumes 1-3]]3     * [[http://​download.intel.com/​products/​processor/​manual/​325462.pdf|(15MB) Combined Volumes 1-3]]3
  
 **Mentioned during lecture:** **Mentioned during lecture:**
-  ​* P&H Chapter 4, Sections 4.1-4.4. +  * {{paper_aklaiber_19jan00.pdf | Klaiber"The Technology Behind CrusoeTM Processors"​Transmeta White Paper2000}}
-  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] +
-  * P&P Chapter 5 (The LC3) +
-  ​* {{p25-patterson.pdf|PattersonD. A., & Ditzel, D. R. (1980). ​The case for the reduced instruction set computer. SIGARCH Comput. Archit. News8(6).}} +
-  * [[http://​www.ece.cmu.edu/​~koopman/​stack_computers/​sec3_2.html | KoopmanP. (1989) Stack Computers: The New Wave.]] +
-  * {{chapter9.pdf|Levy,​ H. (1984). Capability-Based Computer Systems. Chapter 9. The Intel iAPX 432.}} +
-  * {{p489-wilner.pdf|Wilner,​ W. T. (1972). Design of the Burroughs B1700. Proceedings of the December 5-7, 1972, fall joint computer conference, part I. }}+
  
- +===== Lecture 4 (1/21 Wed.) ===== 
-===== Lecture 4 (1/22 Wed.) ===== +**Required:**
-**Required**+
   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&​P Chapter 4 (The von Neumann Model)]]   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​PP_Chap4.pdf|P&​P Chapter 4 (The von Neumann Model)]]
   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixa.pdf|P&​P Appendix A (The LC-3b ISA)]]   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixa.pdf|P&​P Appendix A (The LC-3b ISA)]]
   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]]   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]]
  
-===== Lecture 5 (1/24 Fri.) =====+**Mentioned during lecture:​** 
 + 
 +===== Lecture 5 (1/23 Fri.) =====
 **Required** **Required**
   * None   * None
  
-===== Lecture 6 (1/27 Mon.) =====+===== Lecture 6 (1/26 Mon.) =====
 **Required:​** **Required:​**
   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]]   * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​pp-appendixc.pdf|P&​P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]]
Line 79: Line 75:
   * {{bestway.pdf|Wilkes,​ M. V. (1951). The best way to design an automatic calculating machine. Manchester University Computer Inaugural Conference.}}   * {{bestway.pdf|Wilkes,​ M. V. (1951). The best way to design an automatic calculating machine. Manchester University Computer Inaugural Conference.}}
 **Mentioned during lecture:** **Mentioned during lecture:**
-  * {{bestway.pdf|Wilkes,​ M. V. (1951). The best way to design an automatic calculating machine. Manchester University Computer Inaugural Conference.}} 
  
-===== Lecture 7 (1/29 Wed.) =====+===== Lecture 7 (1/28 Wed.) =====
 **Required:​** **Required:​**
   * None   * None
  
 **Mentioned during lecture:** **Mentioned during lecture:**
-  * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/pp-appendixc.pdf|P&P Appendix C (The Microarchitecture of the LC-3bBasic Machine)]]+  * {{bestway.pdf|Wilkes,​ M. V. (1951). The best way to design an automatic calculating machine. Manchester University Computer Inaugural Conference.}} 
 +  * [[http://research.microsoft.com/pubs/68221/acrobat.pdf |Butler W. Lampson“Hints for Computer System Design,” ACM Operating Systems Review, 1983.]]
  
-===== Lecture 8 (1/31 Fri.) ===== +===== Lecture 8 (2/Mon.) =====
-**Required:​** +
-  * None +
- +
-===== Lecture 9 (2/Mon.) =====+
 **Required:​** **Required:​**
   * P&H Sections 4.9-4.11   * P&H Sections 4.9-4.11
   * {{00476078.pdf|Smith,​ J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}}   * {{00476078.pdf|Smith,​ J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}}
 +  * {{00004607.pdf|Smith,​ J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}}
 +  * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|Mcfarling,​ S. (1993). Combining branch predictors. WRL Technical Note TN-36.}}
 +  * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler,​ R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}}
  
 **Mentioned during lecture:** **Mentioned during lecture:**
-  * {{p177-allen.pdf|Allen,​ J. R., Kennedy, K., Porterfield,​ C., & Warren, J. (1983). Conversion of control dependence to data dependence. Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages.}} 
-  * {{24400043.pdf|Kim,​ H., Mutlu, O., Stark, J., & Patt, Y. N. (2005). Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.}} 
-  * {{thornton_-_1964_-_parallel_operation_in_the_control_data_6600.pdf|Thornton,​ J. E. (1964). Parallel Operation in the Control Data 6600. Proceedings of the Fall Joint Computer Conference.}} 
-  * {{smith78_hep.pdf|Smith,​ B. J. (1978). A pipelined, shared resource MIMD computer. International Conference on Parallel Processing.}} 
   * {{p16-pettis.pdf|Pettis,​ K., & Hansen, R. C. (1990). Profile guided code positioning. Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation.}}   * {{p16-pettis.pdf|Pettis,​ K., & Hansen, R. C. (1990). Profile guided code positioning. Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation.}}
  
-===== Lecture ​10 (2/Wed.) ===== +===== Lecture ​(2/Wed.) =====
 **Required:​** **Required:​**
 +  * P&H Sections 4.9-4.11
 +  * {{00476078.pdf|Smith,​ J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}}
 +  * {{00004607.pdf|Smith,​ J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}}
   * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|Mcfarling,​ S. (1993). Combining branch predictors. WRL Technical Note TN-36.}}   * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|Mcfarling,​ S. (1993). Combining branch predictors. WRL Technical Note TN-36.}}
   * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler,​ R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}}   * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler,​ R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}}
 +
 **Mentioned during lecture:** **Mentioned during lecture:**
-  * {{p300-ball.pdf|BallT., & LarusJ. R. (1993). Branch prediction for free. Proceedings of the ACM SIGPLAN 1993 conference on Programming language design ​and implementation.}} +  * {{flash-memory-data-retention_hpca15.pdf|Yu Cai, Yixin LuoErich FHaratschKen Mai, and Onur MutluData Retention in MLC NAND Flash Memory: Characterization,​ Optimization and Recovery, HPCA 2015.}} 
-  * {{p135-smith.pdf|SmithJ. E. (1981). A study of branch prediction strategies. Proceedings of the 8th annual symposium on Computer Architecture.}} +  * {{adaptive-latency-dram_hpca15.pdf|Donghyuk LeeYoongu KimGennady PekhimenkoSamira KhanVivek SeshadriKevin Changand Onur MutluAdaptive-Latency DRAMOptimizing DRAM Timing ​for the Common-Case, HPCA 2015.}} 
-  * {{yeh_patt_-_1991_-_two-level_adaptive_training_branch_prediction.pdf|YehT.-Y.& PattY. N. (1991). Two-level adaptive training branch prediction. Proceedings of the 24th annual international symposium on Microarchitecture.}} +  * {{compression-aware-cache-management_hpca15.pdf|Gennady PekhimenkoTyler HubertyRui CaiOnur MutluPhillip PGibbonsMichael ​A. Kozuch, and Todd CMowryExploiting Compressed Block Size as an Indicator of Future Reuse, HPCA 2015.}}
-  * {{p22-chang.pdf|ChangP.-Y.HaoE.Yeh, T.-Y., & Patt, Y. (1994). Branch classificationa new mechanism ​for improving branch predictor performance. Proceedings of the 27th annual international symposium on Microarchitecture.}} +
-  * {{hpca01.pdf|Daniel A. Jimenez and Calvin Lin. 2001. Dynamic Branch Prediction with Perceptrons. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA '​01)}} +
-  * {{Riseman.1972.TC.pdf|E. M. Riseman and C. C. Foster. 1972. The Inhibition of Potential Parallelism by Conditional Jumps. IEEE Trans. Comput. 21, 12 (December 1972)}} +
- +
-===== Lecture 11 (2/12 Wed.) ===== +
-** Required ** +
-  * None +
- +
-** Mentioned during the lecture ** +
-  * {{p274-chang.pdf|Po-Yung Chang, Eric Hao, and Yale N. Patt. 1997. Target prediction for indirect jumps. ISCA'​97.}} +
- +
-===== Lecture 12 (2/14 Fri.) ===== +
-** Required ** +
-  * P&H Sections 4.9-4.11 +
-  * {{00476078.pdf|SmithJ. E.& SohiG. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} +
-  * {{00004607.pdf|SmithJE., & Pleszkun, A. R(1988). Implementing precise interrupts in pipelined processors. ComputersIEEE Transactions on.}}+
readings.txt · Last modified: 2015/04/13 19:31 by kevincha