This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
readings [2015/01/12 18:48] kevincha |
readings [2015/02/16 20:19] rachata |
||
---|---|---|---|
Line 7: | Line 7: | ||
* (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/pp-appendixc.pdf|P&P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] | * (CMU WebISO) [[http://www.ece.cmu.edu/~ece447/cmu_only/pp-appendixc.pdf|P&P Appendix C (The Microarchitecture of the LC-3b, Basic Machine)]] | ||
* **P&H** stands for Patterson & Hennessy's //Computer Organization and Design: The Hardware/Software Interface// | * **P&H** stands for Patterson & Hennessy's //Computer Organization and Design: The Hardware/Software Interface// | ||
+ | |||
+ | ====== Guides on how to review papers critically ====== | ||
+ | * Lecture slides: {{onur-447-s15-how-to-do-the-paper-reviews.pdf | pdf}} {{onur-447-s15-how-to-do-the-paper-reviews.ppt | Slides ppt}} | ||
+ | * Example reviews on "Main Memory Scaling: Challenges and Solution Directions" (link to the paper) | ||
+ | * {{review-chapter.pdf | Review 1}} | ||
+ | * {{review-chapter-2.pdf | Review 2}} | ||
+ | * Example review on "Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems" (link to the paper) | ||
+ | * {{review-sms.pdf | Review 1}} | ||
+ | |||
===== Lecture 1 (1/12 Mon.) ===== | ===== Lecture 1 (1/12 Mon.) ===== | ||
**Required:** | **Required:** | ||
- | * None | + | * For HW1: {{00964437.pdf|Patt, Y. (2001). Requirements, bottlenecks, and good fortune: agents for microprocessor evolution. Proceedings of the IEEE.}} |
**Mentioned during lecture:** | **Mentioned during lecture:** | ||
Line 22: | Line 31: | ||
* Memory Channel Partitioning", MICRO 2011.}} | * Memory Channel Partitioning", MICRO 2011.}} | ||
* {{http://users.ece.cmu.edu/~omutlu/pub/raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} | * {{http://users.ece.cmu.edu/~omutlu/pub/raidr-dram-refresh_isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/memory-scaling_memcon13.pdf|Onur Mutlu, "Memory Scaling: A Systems Architecture Perspective" Technical talk at MemCon 2013 (MEMCON), Santa Clara, CA, August 2013.}} | ||
+ | * {{http://users.ece.cmu.edu/~kevincha/papers/chang_hpca2014.pdf|Kevin Chang, Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu, "Improving DRAM Performance by Parallelizing Refreshes with Accesses", In HPCA 2014, Orlando, Feb. 2014.}} | ||
+ | * {{http://users.ece.cmu.edu/~yoonguk/papers/kim-isca14.pdf | Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, Onur Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors", In ISCA-41, 2014.}} | ||
===== Lecture 2 (1/14 Wed.) ===== | ===== Lecture 2 (1/14 Wed.) ===== | ||
Line 42: | Line 54: | ||
**Mentioned during lecture:** | **Mentioned during lecture:** | ||
+ | * {{paper_aklaiber_19jan00.pdf | Klaiber, "The Technology Behind CrusoeTM Processors", Transmeta White Paper, 2000}} | ||
===== Lecture 4 (1/21 Wed.) ===== | ===== Lecture 4 (1/21 Wed.) ===== | ||
Line 68: | Line 81: | ||
**Mentioned during lecture:** | **Mentioned during lecture:** | ||
+ | * {{bestway.pdf|Wilkes, M. V. (1951). The best way to design an automatic calculating machine. Manchester University Computer Inaugural Conference.}} | ||
+ | * [[http://research.microsoft.com/pubs/68221/acrobat.pdf |Butler W. Lampson, “Hints for Computer System Design,” ACM Operating Systems Review, 1983.]] | ||
+ | ===== Lecture 8 (2/2 Mon.) ===== | ||
+ | **Required:** | ||
+ | * P&H Sections 4.9-4.11 | ||
+ | * {{00476078.pdf|Smith, J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} | ||
+ | * {{00004607.pdf|Smith, J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}} | ||
+ | * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|Mcfarling, S. (1993). Combining branch predictors. WRL Technical Note TN-36.}} | ||
+ | * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}} | ||
+ | |||
+ | **Mentioned during lecture:** | ||
+ | * {{p16-pettis.pdf|Pettis, K., & Hansen, R. C. (1990). Profile guided code positioning. Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation.}} | ||
+ | |||
+ | ===== Lecture 9 (2/4 Wed.) ===== | ||
+ | **Required:** | ||
+ | * P&H Sections 4.9-4.11 | ||
+ | * {{00476078.pdf|Smith, J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} | ||
+ | * {{00004607.pdf|Smith, J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}} | ||
+ | * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|Mcfarling, S. (1993). Combining branch predictors. WRL Technical Note TN-36.}} | ||
+ | * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}} | ||
+ | |||
+ | **Mentioned during lecture:** | ||
+ | * {{flash-memory-data-retention_hpca15.pdf|Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, and Onur Mutlu, Data Retention in MLC NAND Flash Memory: Characterization, Optimization and Recovery, HPCA 2015.}} | ||
+ | * {{adaptive-latency-dram_hpca15.pdf|Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, and Onur Mutlu, Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case, HPCA 2015.}} | ||
+ | * {{compression-aware-cache-management_hpca15.pdf|Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry, Exploiting Compressed Block Size as an Indicator of Future Reuse, HPCA 2015.}} | ||
+ | |||
+ | |||
+ | ===== Lecture 10 (2/6 Fri.) ===== | ||
+ | **Required:** | ||
+ | * P&H Sections 4.9-4.11 | ||
+ | * {{00476078.pdf|Smith, J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} | ||
+ | * {{00004607.pdf|Smith, J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}} | ||
+ | * {{mcfarling_-_1993_-_combining_branch_predictors.pdf|Mcfarling, S. (1993). Combining branch predictors. WRL Technical Note TN-36.}} | ||
+ | * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}} | ||
+ | |||
+ | **Mentioned in the Lecture:** | ||
+ | * {{p300-ball.pdf|Ball, T., & Larus, J. R. (1993). Branch prediction for free. Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation.}} | ||
+ | * {{p135-smith.pdf|Smith, J. E. (1981). A study of branch prediction strategies. Proceedings of the 8th annual symposium on Computer Architecture.}} | ||
+ | * {{yeh_patt_-_1991_-_two-level_adaptive_training_branch_prediction.pdf|Yeh, T.-Y., & Patt, Y. N. (1991). Two-level adaptive training branch prediction. Proceedings of the 24th annual international symposium on Microarchitecture.}} | ||
+ | * {{p22-chang.pdf|Chang, P.-Y., Hao, E., Yeh, T.-Y., & Patt, Y. (1994). Branch classification: a new mechanism for improving branch predictor performance. Proceedings of the 27th annual international symposium on Microarchitecture.}} | ||
+ | * {{hpca01.pdf|Daniel A. Jimenez and Calvin Lin. 2001. Dynamic Branch Prediction with Perceptrons. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA '01)}} | ||
+ | * {{Riseman.1972.TC.pdf|E. M. Riseman and C. C. Foster. 1972. The Inhibition of Potential Parallelism by Conditional Jumps. IEEE Trans. Comput. 21, 12 (December 1972)}} | ||
+ | * {{p274-chang.pdf|Po-Yung Chang, Eric Hao, and Yale N. Patt. 1997. Target prediction for indirect jumps. ISCA'97.}} | ||
+ | * {{kim_isca07.pdf|Hyesoon Kim, José A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, and Robert Cohn. 2007. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. ISCA'07}} | ||
+ | |||
+ | ===== Lecture 11 (2/11 Wed.) ===== | ||
+ | **Required:** | ||
+ | |||
+ | **Mentioned in the Lecture:** | ||
+ | * {{p18-hwu.pdf|Hwu and Patt (1987). Checkpoint Repair for Out-of-order Execution Machines.}} | ||
+ | * {{00004607.pdf|Smith, J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}} | ||
+ | * {{ogehl.pdf | Seznec (2005). Analysis of the O-GEometric History Length Branch Predictor. ISCA}} | ||
+ | * {{hpca01.pdf|Daniel A. Jimenez and Calvin Lin. 2001. Dynamic Branch Prediction with Perceptrons. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA '01)}} | ||
+ | |||
+ | ===== Lecture 12 (2/13 Fri.) ===== | ||
+ | **Required:** | ||
+ | * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}} | ||
+ | |||
+ | ===== Lecture 13 (2/16 Mon.) ===== | ||
+ | **Required:** | ||
+ | * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}} | ||
+ | * {{00476078.pdf|Smith, J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} | ||
+ | * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}} | ||
+ | * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}} | ||