User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
readings [2015/04/06 19:13]
clement
readings [2015/04/13 15:31] (current)
kevincha
Line 359: Line 359:
   * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}}   * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|Papamarcos,​ M. S., & Patel, J. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}}
 ** Mentioned during lecture: ** ** Mentioned during lecture: **
-  * {{p176-baer.pdf|Baer, J.-L., & Chen, T.-F. (1991). An effective on-chip preloading scheme to reduce data access penaltyProceedings ​of the 1991 ACM/IEEE conference on Supercomputing.}} +  * {{horner-1819.pdf|Horner ​(1819). A new method of solving numerical equations of all orders, by continuous approximationPhilosophical Transactions ​of the Royal Society}} 
-  * {{04147648.pdf|SrinathS., Mutlu, O., Kim, H., & Patt, Y. N. (2007). Feedback Directed Prefetching:​ Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. Proceedings of the 2007 IEEE 13th International Symposium on High Performance ​Computer ​Architecture.}} + 
-  * {{joseph_grunwald_-_1997_-_prefetching_using_markov_predictors.pdf|JosephD., & GrunwaldD. (1997). Prefetching using Markov predictors. Proceedings of the 24th annual international symposium on Computer architecture.}} +===== Lecture 28 (4/8 Wed.) ===== 
-  * {{p279-cooksey.pdf|CookseyR., Jourdan, S.GrunwaldD. (2002). A stateless, content-directed data prefetching mechanismProceedings of the 10th international conference on Architectural support for programming languages and operating systems.}} +** Required: ** 
-  * {{04798232.pdf|Ebrahimi, E., Mutlu, O., & Patt, YN. (2009). Techniques for bandwidth-efficient prefetching of linked data structures ​in hybrid prefetching ​systems. High Performance Computer Architecture,​ 2009.}} +  * {{lamport_-_1979_-_how_to_make_a_multiprocessor_computer_that_correctly_executes_multiprocess_programs.pdf|LamportL. (1979). How to Make a Multiprocessor ​Computer ​That Correctly Executes Multiprocess Programs.}} 
-  * {{p186-chappell.pdf|Chappell, RS., Stark, J., Kim, SP., Reinhardt, SK., & Patt, Y. N. (1999). Simultaneous subordinate microthreading ​(SSMT). Proceedings of the 26th annual international symposium on Computer architecture.}} +  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|PapamarcosM. S., & PatelJ. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} 
-  * {{p2-zilles.pdf|ZillesC., & SohiG. (2001). Execution-based prediction using speculative slices. Proceedings of the 28th annual international symposium on Computer architecture.}} +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​culler-mesi.pdf|C&SChapters 5.1 & 5.3]] 
-  * {{p40-luk.pdf|Luk, C.-K. (2001). Tolerating memory latency through software-controlled pre-execution ​in simultaneous multithreading processors. Proceedings of the 28th annual international symposium on Computer architecture.}} +  * P&HChapter 5.8 
-  * {{p172-zilles.pdf|Zilles, CB., & Sohi, GS. (2000). Understanding the backward slices of performance degrading instructions. Proceedings of the 27th annual international symposium on Computer architecture.}} +** Recommended:​ ** 
-  * {{mutlu_et_al._-_2003_-_runahead_execution_an_alternative_to_very_large_instruction_windows_for_out-of-order_processors.pdf|Mutlu, O., Stark, J., Wilkerson, C., & Patt, Y. N. (2003). ​Runahead ExecutionAn Alternative to Very Large Instruction Windows for Out-of-Order Processors. Proceedings of the 9th International Symposium on High-Performance Computer Architecture.}} +  * {{10.1.1.17.8112.pdf|Gharachorloo et al. (1990). Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors.}} 
-  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|Jouppi, N. P. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative ​cache and prefetch buffersProceedings ​of the 17th annual international symposium on Computer Architecture.}}+  * {{10.1.1.89.3693.pdf|Gharachorloo et al. (1991). Two Techniques to Enhance ​the Performance of Memory Consistency Models.}} 
 +  * {{isca07_bulksc.pdf|Ceze et al(2007)BulkSC: Bulk Enforcement of Sequential Consistency.}} 
 +  * {{censier.pdf|Censier et al. (1978). A new solution to coherence problems ​in multicache ​systems.}} 
 +  * {{goodman-snoopyprotocol.pdf|Goodman (1983)Using cache memory to reduce processor-memory traffic.}} 
 +  * {{isca123.pdf|Laudon et al(1997)The SGI Origin: a ccNUMA highly scalable server.}} 
 +  * {{isca03_token_coherence.pdf|Martin et al. (2003). Token coherence: decoupling performance and correctness.}} 
 +  * {{p73-baer.pdf|Baer et al. (1988). On the inclusion properties for multi-level cache hierarchies.}} 
 +** Mentioned during lecture: ** 
 +  * (HTML) [[http://​www.cs.utexas.edu/​users/​EWD/​transcriptions/​EWD01xx/​EWD123.html|Dijkstra (1965) Cooperating Sequential Processes.]] 
 + 
 +===== Lecture 29 (4/10 Fri.) ===== 
 +** Required: ** 
 +  * (CMU WebISO) [[http://​www.ece.cmu.edu/​~ece447/​cmu_only/​culler-mesi.pdf|C&​S,​ Chapters 5.1 & 5.3]] 
 +  * P&H, Chapter 5.8 
 +  * {{papamarcos_patel_-_1984_-_a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|PapamarcosM. S., & PatelJ. H. (1984). A low-overhead coherence solution for multiprocessors with private cache memories. Proceedings of the 11th annual international symposium on Computer architecture.}} 
 +** Recommended:​ ** 
 +  * {{censier.pdf|Censier et al. (1978). A new solution to coherence problems ​in multicache systems.}} 
 +  * {{goodman-snoopyprotocol.pdf|Goodman (1983)Using cache memory to reduce processor-memory traffic.}} 
 +  * {{isca123.pdf|Laudon et al. (1997). The SGI Origin: a ccNUMA highly scalable server.}} 
 +  * {{isca03_token_coherence.pdf|Martin et al. (2003). ​Token coherencedecoupling performance and correctness.}} 
 +  * {{p73-baer.pdf|Baer et al. (1988). On the inclusion properties for multi-level cache hierarchies.}} 
 + 
 +===== Lecture 30 (4/13 Mon.) ===== 
 +** Required: ** 
 +  * {{rowclone_micro13.pdf|Seshadri et al., “RowClone:​ Fast and Efficient In-DRAM Copy and Initialization ​of Bulk Data,” MICRO 2013.}} 
readings.1428362008.txt.gz · Last modified: 2015/04/06 19:13 by clement