User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
readings [2014/04/16 14:12]
rachata
readings [2014/05/20 01:42]
rachata
Line 281: Line 281:
  
 ===== Lecture 29 (4/16 Wed.) ===== ===== Lecture 29 (4/16 Wed.) =====
 +** Required: **
 +  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl,​ G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}}
 +  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|Lamport,​ L. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs.}}
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​culler-mesi.pdf|C&​S,​ Chapters 5.1 & 5.3]]
 +  * P&H, Chapter 5.8
 +
 +===== Lecture 30 (4/18 Fri.) =====
 +** Required: **
 +  * {{LCP.pdf|Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency,” MICRO 2013.}}
 +  * {{bdi-compression_pact12.pdf|Pekhimenko et al., "​Base-Delta-Immediate Compression:​ Practical Data Compression for On-Chip Caches,"​ PACT 2012.}}
 +  * {{mise-predictable_memory_performance-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013.}} ​
 +
 +===== Lecture 31 (4/28 Mon.) =====
 ** Required: ** ** Required: **
   * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl,​ G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}}   * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|Amdahl,​ G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer conference.}}
Line 292: Line 305:
   * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}}   * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}}
 ** Mentioned during lecture: ** ** Mentioned during lecture: **
-  * {{p176-baer.pdf|Baer, J.-L., & ChenT.-F. (1991). An effective ​on-chip ​preloading scheme to reduce data access penalty. Proceedings of the 1991 ACM/IEEE conference ​on Supercomputing.}} +  * {{p168-patel.pdf|Patel, J. H. (1979). Processor-memory interconnections for multiprocessors. Proceedings of the 6th annual symposium on Computer architecture.}} 
-  * {{04147648.pdf|SrinathS., MutluO., KimH., & PattYN. (2007). Feedback Directed Prefetching:​ Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. Proceedings of the 2007 IEEE 13th International Symposium ​on High Performance ​Computer Architecture.}} +  * {{p196-moscibroda.pdf|Moscibroda,​ T., & MutluO. (2009). A case for bufferless routing in on-chip ​networks. Proceedings of the 36th annual international symposium ​on Computer architecture.}} 
-  * {{joseph_grunwald_-_1997_-_prefetching_using_markov_predictors.pdf|JosephD., & GrunwaldD. (1997). Prefetching using Markov predictors. Proceedings of the 24th annual international symposium on Computer architecture.}} +  * {{p27-gottlieb.pdf|GottliebA., GrishmanR., KruskalC. P., McAuliffeKP., Rudolph, L., & Snir, M. (1982). The NYU Ultracomputer ​-- designing a MIMD, shared-memory parallel machine (Extended Abstract). Proceedings of the 9th annual symposium ​on Computer Architecture.}} 
-  * {{p279-cooksey.pdf|CookseyR., Jourdan, S., & Grunwald, D. (2002). A stateless, content-directed data prefetching mechanism. Proceedings of the 10th international ​conference ​on Architectural support for programming languages and operating systems.}} +  * {{p22-seitz.pdf|Seitz,​ C. L. (1985). The cosmic cube. Commun. ACM.}} 
-  * {{04798232.pdf|Ebrahimi, E., MutluO., & PattYN. (2009). Techniques for bandwidth-efficient prefetching ​of linked data structures in hybrid prefetching systemsHigh Performance ​Computer ​Architecture,​ 2009.}} +  * {{p278-glass.pdf|GlassC. J., & NiL. M. (1992). The turn model for adaptive routing. Proceedings of the 19th annual international symposium on Computer architecture.}} 
-  * {{p186-chappell.pdf|ChappellRS., Stark, J., Kim, S. P., Reinhardt, S. K., & PattY. N. (1999). Simultaneous subordinate microthreading (SSMT)Proceedings of the 26th annual international symposium on Computer architecture.}} + 
-  * {{p2-zilles.pdf|ZillesC., & SohiG. (2001). Execution-based prediction using speculative slicesProceedings of the 28th annual international symposium on Computer architecture.}} +===== Lecture 32 (4/30 Wed.) ===== 
-  * {{p40-luk.pdf|LukC.-K. (2001). Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processorsProceedings of the 28th annual international symposium on Computer architecture.}} +** Required: ** 
-  * {{p172-zilles.pdf|ZillesCB., & SohiGS. (2000). Understanding the backward slices of performance degrading instructions. Proceedings of the 27th annual ​international ​symposium ​on Computer architecture.}} +  * None 
-  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu, O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). Runahead Execution: An Alternative to Very Large Instruction Windows ​for Out-of-Order Processors. Proceedings of the 9th International Symposium ​on High-Performance ​Computer ​Architecture.}} + 
-  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|JouppiNP. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Proceedings of the 17th annual ​international ​symposium ​on Computer Architecture.}}+** Mentioned during lecture: ** 
 +  * {{amdahl_-_1967_-_validity_of_the_single_processor_approach_to_achieving_large_scale_computing_capabilities.pdf|AmdahlGM. (1967). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the April 18-20, 1967, spring joint computer ​conference.}} 
 +  * {{grochowski_et_al._-_2004_-_best_of_both_latency_and_throughput.pdf|Grochowski, E., RonenR., ShenJ., & Wang, H. (2004). Best of Both Latency and ThroughputProceedings of the IEEE International Conference on Computer ​Design (pp. 236–243).}} 
 +  * {{tendler_et_al._-_2002_-_power4_system_microarchitecture.pdf|TendlerJM., Dodson, J. S., FieldsJ. S., Le, H., & SinharoyB. (2002). POWER4 system microarchitectureIBM J. Res. Dev.}} 
 +  * {{01289290.pdf|KallaR., Sinharoy, B., & TendlerJ. M. (2004). IBM Power5 Chip: A Dual-Core Multithreaded ProcessorIEEE Micro.}} 
 +  * {{kongetira_aingaran_olukotun_-_2005_-_niagara_a_32-way_multithreaded_sparc_processor.pdf|KongetiraP., Aingaran, K., & Olukotun, ​K. (2005). Niagara: A 32-Way Multithreaded Sparc ProcessorIEEE Micro.}} 
 +  * {{p253-suleman.pdf|SulemanMA., Mutlu, O., Qureshi, M. K., & PattYN. (2009). Accelerating critical section execution with asymmetric multi-core architectures. Proceedings of the 14th international ​conference ​on Architectural support for programming languages and operating systems.}} 
 +  * {{p441-suleman.pdf|Suleman, M. A., Mutlu, O., Joao, J. A., Khubaib, & Patt, Y. N. (2010). Data marshaling ​for multi-core architectures. Proceedings of the 37th annual international symposium ​on Computer ​architecture.}} 
 +  * {{p223-joao.pdf|JoaoJA., Suleman, M. A., Mutlu, O., & Patt, Y. N. (2012). Bottleneck identification ​and scheduling in multithreaded applications. Proceedings of the seventeenth ​international ​conference ​on Architectural Support for Programming Languages and Operating Systems.}}
readings.txt · Last modified: 2015/04/13 19:31 by kevincha