
This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
readings [2015/09/17 17:38]
nandita [Optional Readings Mentioned in Lecture]
readings [2015/09/28 16:38]
nandita [Review Set 2 (due 3 PM)]
Line 4: Line 4:
 ===== Recitation 1 ===== ===== Recitation 1 =====
-==== Review Set 1 (due 3 PM)====+==== Review Set 1====
   - Onur Mutlu and Lavanya Subramanian,​ [[http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-systems-research_superfri14.pdf | Research Problems and Opportunities in Memory   - Onur Mutlu and Lavanya Subramanian,​ [[http://​users.ece.cmu.edu/​~omutlu/​pub/​memory-systems-research_superfri14.pdf | Research Problems and Opportunities in Memory
 Systems]], //Invited Article in Supercomputing Frontiers and Innovations Systems]], //Invited Article in Supercomputing Frontiers and Innovations
Line 24: Line 24:
 ===== Recitation 2 ===== ===== Recitation 2 =====
-==== Review Set 2 (due 3 PM)====+==== Review Set 2====
   -  Ahn et al., [[ http://​users.ece.cmu.edu/​~omutlu/​pub/​tesseract-pim-architecture-for-graph-processing_isca15.pdf | A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing]], ​ //ISCA 2015.// **[Review Required]**   -  Ahn et al., [[ http://​users.ece.cmu.edu/​~omutlu/​pub/​tesseract-pim-architecture-for-graph-processing_isca15.pdf | A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing]], ​ //ISCA 2015.// **[Review Required]**
   -  Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, David Glasco, [[ http://​www.cs.nyu.edu/​courses/​spring12/​CSCI-GA.3033-012/​ieee-micro-echelon.pdf | GPUs and the Future of Parallel Computing]],​ IEEE Micro 2011. **[Review Required]**   -  Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, David Glasco, [[ http://​www.cs.nyu.edu/​courses/​spring12/​CSCI-GA.3033-012/​ieee-micro-echelon.pdf | GPUs and the Future of Parallel Computing]],​ IEEE Micro 2011. **[Review Required]**
Line 69: Line 69:
 ===== Lecture 5 ===== ===== Lecture 5 =====
 ==== Optional Readings Mentioned in Lecture ==== ==== Optional Readings Mentioned in Lecture ====
 +  * T. Yeh and Y. Patt [[http://​web.cecs.pdx.edu/​~herb/​ece587s15/​Papers/​08_yeh_patt_br_predict_1991.pdf ​ | Two-Level Adaptive Training Branch Prediction]], ​ //Intl. Symposium on Microarchitecture,​ November 1991. MICRO Test of Time Award Winner (after 24 years).//
 +  * Kessler, R. E., [[http://​cseweb.ucsd.edu/​classes/​sp00/​cse241/​alpha.pdf | The Alpha 21264 Microprocessor]],​ //IEEE Micro, March/April 1999, pp. 24-36 //
 +  * McFarling, S., [[http://​www.hpl.hp.com/​techreports/​Compaq-DEC/​WRL-TN-36.pdf | Combining Branch Predictors]],​ //DEC WRL Technical Report, TN-36, June 1993//
 +  * Smith and Sohi, [[ftp://​ftp.cs.wisc.edu/​sohi/​papers/​1995/​ieee-proc.superscalar.pdf | The Microarchitecture of Superscalar Processors]],​ //​Proceedings of the IEEE, 1995.//
 +  * Evers et al., [[http://​www.ece.cmu.edu/​~ece740/​f10/​lib/​exe/​fetch.php?​media=analysisofcorrelationandpredictability.pdf | An Analysis of Correlation and Predictability:​ What Makes Two-Level Branch Predictors Work]], //ISCA 1998//
 +  * Chang et al., [[http://​ieeexplore.ieee.org/​xpls/​abs_all.jsp?​arnumber=717404 | Branch classification:​ a new mechanism for improving branch predictor performance]],​ //MICRO 1994//
 +  * Sprangle et al., [[http://​ieeexplore.ieee.org/​xpls/​abs_all.jsp?​arnumber=604711 | The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference]],​ //ISCA 1997.//
 +  * Seznec, [[http://​www.irisa.fr/​caps/​oldcaps/​people/​seznec/​Optim2bcgskew.pdf | An optimized 2bcgskew branch predictor]],​ //IRISA Tech Report 1993.//
 +  * Michaud, [[http://​citeseerx.ist.psu.edu/​viewdoc/​download?​doi=​rep=rep1&​type=pdf | Trading conflict and capacity aliasing in conditional branch predictors]],​ //ISCA 1997//
 +  * Lee et al., [[http://​www-inst.eecs.berkeley.edu/​~cs152/​sp05/​handouts/​p4-lee.pdf | The bi-mode branch predictor]],​ //MICRO 1997.//
 +  * Eden and Mudge, [[http://​web.eecs.umich.edu/​~tnm/​papers/​yags.pdf | The YAGS branch prediction scheme]], //MICRO 1998.//
 +  * Seznec et al., [[http://​www.cs.utah.edu/​~rajeev/​cs7810/​papers/​seznec02.pdf | Design tradeoffs for the Alpha EV8 conditional branch predictor]],​ //ISCA 2002.//
 +  * Chappell et al., [[http://​www.ece.cmu.edu/​~ece740/​f13/​lib/​exe/​fetch.php?​media=chappell_ssmt99.pdf | Simultaneous Subordinate Microthreading (SSMT)]], //ISCA 1999.//
 +  * Seznec, [[https://​classes.soe.ucsc.edu/​cmpe221/​Spring06/​papers/​03trace.pdf | Analysis of the O-Geometric History Length branch predictor]],​ //ISCA 2005//
 +  * Gochman et al., [[http://​www.weblearn.hs-bremen.de/​risse/​RST/​WS04/​Centrino/​vol7iss2_art03.pdf | The Intel Pentium M Processor: Microarchitecture and Performance]],​ //Intel Technology Journal, May 2003//
 +  * Jimenez and Lin, [[https://​www.cs.utexas.edu/​~lin/​papers/​hpca01.pdf | Dynamic Branch Prediction with Perceptrons]],​ //HPCA 2001//
 +  * Rosenblatt, [[http://​catalog.hathitrust.org/​Record/​000203591 | Principles of Neurodynamics:​ Perceptrons and the Theory of Brain Mechanisms]],​ //1962//
 +  * Seznec and Michaud, ​ [[http://​www.jilp.org/​vol8/​v8paper1.pdf | A case for (partially) tagged Geometric History Length Branch Prediction]],​ //JILP 2006.//
 +  * Andre Seznec, [[http://​www.jilp.org/​cbp2014/​paper/​AndreSeznec.pdf | TAGE-SC-L branch predictors]],​ //CBP 2014//
 +  * Chappell et al., [[http://​hps.ece.utexas.edu/​pub/​ssmt_isca_29.pdf | Difficult-Path Branch Prediction Using Subordinate Microthreads]],​ //ISCA 2002.//
 +  * Jacobsen et al., [[http://​people.engr.ncsu.edu/​ericro/​publications/​conference_MICRO-29_jrs.pdf | Assigning Confidence to Conditional Branch Predictions]],​ //MICRO 1996.//
 +  * Manne et al., [[http://​www.cs.utah.edu/​~rajeev/​cs7810/​papers/​manne98.pdf | Pipeline Gating: Speculation Control for Energy Reduction]],​ //ISCA 1998//
 +  * Pettis and Hansen, [[http://​perso.ensta-paristech.fr/​~bmonsuez/​Cours/​B6-4/​Articles/​papers15.pdf | Profile Guided Code Positioning]],​ //PLDI 1990.//
 +  * Hwu et al., [[http://​impact.crhc.illinois.edu/​shared/​papers/​hwu_jsuper93.pdf | The Superblock: An effective technique for VLIW and superscalar compilation,​” Journal of Supercomputing]],​ //1993.//
 +  * Rotenberg et al., [[http://​people.engr.ncsu.edu/​ericro/​publications/​conference_MICRO-29_rbs.pdf | Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching]], //MICRO 1996.//
 +  * Patel et al., [[https://​www.eecs.umich.edu/​techreports/​cse/​97/​CSE-TR-335-97.pdf | Critical Issues Regarding the Trace Cache Fetch Mechanism]],​ //Umich TR, 1997.//
 +  * A. Peleg, U. Weiser, [[http://​patft1.uspto.gov/​netacgi/​nph-Parser?​Sect1=PTO1&​Sect2=HITOFF&​d=PALL&​p=1&​u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&​r=1&​f=G&​l=50&​s1=5381533.PN.&​OS=PN/​5381533&​RS=PN/​5381533 | Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line]], //United States Patent No. 5,381,533, Jan 10, 1995.// ​
 +===== Recitation 4 =====
 +==== Review Set 4 ====
 +  * Eiman Ebrahimi et. al., [[ https://​users.ece.cmu.edu/​~omutlu/​pub/​fst_asplos10.pdf | Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems]], ​  //​ ASPLOS 2010// **[Review Required]**
 +  * Rachata Ausavarungnirun et. al., [[http://​users.ece.cmu.edu/​~omutlu/​pub/​MeDiC-for-GPGPUs_pact15.pdf | Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance]], ​ //PACT 2015// **[Review Required]**
 +  * Donghyuk Lee et. al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​tldram_hpca13.pdf | Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture]],​ //HPCA 2013// **[Review Required]**
 +  * Justin Meza et. al., [[ http://​users.ece.cmu.edu/​~omutlu/​pub/​flash-memory-failures-in-the-field-at-facebook_sigmetrics15.pdf | A Large-Scale Study of Flash Memory Errors in the Field]], //​SIGMETRICS 2015// **[Optional]** ​
 +==== Optional Readings Mentioned in the Lecture ====
 +  * Kevin Chang et. al., [[ http://​www.pdl.cmu.edu/​ftp/​associated/​sbacpad2012_hat.pdf | HAT: Heterogeneous Adaptive Throttling for On-Chip Networks]], //SBAC-PAD 2012//
 +  * Wilson W. L. Fung et. al., [[ https://​www.ece.ubc.ca/​~aamodt/​papers/​wwlfung.micro2007.pdf | Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow]], //MICRO 2007// ​
 +  * Donghyuk Lee et. al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​adaptive-latency-dram_hpca15.pdf | Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case]], ​ //HPCA 2015//
 +  * Justin Meza et. al., [[ https://​users.ece.cmu.edu/​~omutlu/​pub/​memory-errors-at-facebook_dsn15.pdf | Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field]], //DSN 2015//
 +  * Junwhan Ahn et al., [[ http://​users.ece.cmu.edu/​~omutlu/​pub/​tesseract-pim-architecture-for-graph-processing_isca15.pdf | A Scalable Processing-in-Memory Accelerator for
 +Parallel Graph Processing]],​ //ISCA 2015.//  ​
 +  * Vivek Seshadri et al., [[http://​users.ece.cmu.edu/​~omutlu/​pub/​in-DRAM-bulk-AND-OR-ieee_cal15.pdf | Fast Bulk Bitwise AND and OR in DRAM]], //IEEE Computer Architecture Letters (CAL), April 2015.//
 +  * Seshadri et al., [[http://​users.ece.cmu.edu/​~omutlu/​pub/​in-DRAM-bulk-AND-OR-ieee_cal15.pdf | RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization ]], //MICRO 2013// ​
 +  * Junwhan Ahn et. al., [[ https://​users.ece.cmu.edu/​~omutlu/​pub/​pim-enabled-instructons-for-low-overhead-pim_isca15.pdf | PIM-Enabled Instructions:​ A Low-Overhead,​ Locality-Aware Processing-in-Memory Architecture]],​ //ISCA 2015.//
 +  * Liu et. al., [[http://​www.pdl.cmu.edu/​PDL-FTP/​NVM/​dram-retention_isca13.pdf | An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms]],​ //ISCA 2013.//
 +  * Khan et. al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​error-mitigation-for-intermittent-dram-failures_sigmetrics14.pdf | The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study]], //​SIGMETRICS 2014.//
 +  * Luo et. al., [[http://​users.ece.cmu.edu/​~omutlu/​pub/​heterogeneous-reliability-memory-for-data-centers_dsn14.pdf | Characterizing Application Memory Error Vulnerability to Optimize Data Center Cost]], //DSN 2014//
 +  * Kim et al., [[ http://​users.ece.cmu.edu/​~omutlu/​pub/​dram-row-hammer_isca14.pdf| Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors]], ISCA 2014.
 +  * Cai et. al. [[http://​www.istc-cc.cmu.edu/​publications/​papers/​2013/​flash-programming-interference_iccd13.pdf | Program Interference in MLC NAND Flash Memory: Characterization,​ Modeling, and Mitigation]]. //ICCD 2013//
 +  * Cai et. al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​flash-error-analysis-and-management_itj13.pdf | Error Analysis and Retention-Aware Error Management for NAND Flash Memory]], //Intel Technology Journal 2013//
 +  * Cai et. al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​neighbor-assisted-error-correction-in-flash_sigmetrics14.pdf | Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories]], //​SIGMETRICS 2014//
 +  * Lee et. al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​adaptive-latency-dram_hpca15.pdf | Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case]],​ //HPCA 2015//
 +  * Qureshi et al. [[ https://​users.ece.cmu.edu/​~omutlu/​pub/​avatar-dram-refresh_dsn15.pdf | AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems]], //DSN 2015//
 +  * Lee et al., [[ http://​users.ece.cmu.edu/​~omutlu/​pub/​pcm_isca09.pdf | Architecting Phase Change Memory as a Scalable DRAM Alternative]],​ //ISCA 2009//
 +  * Yoon, Meza et al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​rowbuffer-aware-caching_iccd12.pdf | Row Buffer Locality Aware Caching Policies for Hybrid Memories]], //ICCD 2012//
 +  * Meza et. al., [[https://​www.ece.cmu.edu/​~safari/​pubs/​timber_cal12.pdf | Enabling Efficient and Scalable Hybrid Memories]], //IEEE Comp. Arch. Letters, 2012//
 +==== Papers Mentioned in the Lecture (Not in Slides) ====
 +  * Yoon et. al., [[http://​users.ece.utexas.edu/​~merez/​vecc_asplos_2010.pdf | Virtualized and Flexible ECC for Main Memory]], //ASPLOS 2010//
 +  * Cai et al., [[https://​users.ece.cmu.edu/​~omutlu/​pub/​flash-memory-data-retention_hpca15.pdf | Data Retention in MLC NAND Flash Memory: Characterization,​ Optimization and Recovery]], //HPCA 2015// ​
 +  * Raoux et. al., [[ http://​researcher.watson.ibm.com/​researcher/​files/​us-gwburr/​PCM_IBMJRD.pdf | Phase-change random access memory: A scalable technology]],​ //IBM JRD 2008//
 +  * Qureshi et. al., [[http://​www.cs.ucsb.edu/​~chong/​290N-W10/​pcm.pdf | Scalable High Performance Main Memory System Using Phase-Change Memory Technology]],​ //ISCA 2009//