Differences
This shows you the differences between two versions of the page.
readings [2010/10/14 15:01] vseshadr |
readings [2010/12/04 06:00] (current) vseshadr |
||
---|---|---|---|
Line 109: | Line 109: | ||
== Recommended Readings == | == Recommended Readings == | ||
* {{datacachesforsuperscalar.pdf|Juan et al., "Data Caches for Superscalar Processors," ICS 1997}} | * {{datacachesforsuperscalar.pdf|Juan et al., "Data Caches for Superscalar Processors," ICS 1997}} | ||
+ | |||
+ | ===== For Lecture 14 ===== | ||
+ | == Required Readings == | ||
+ | * {{markovpredictors.pdf|Joseph and Grunwald, "Prefetching using Markov Predictors,' ISCA 1997}} | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{compileralgorithmforprefetching.pdf| Mowry et al., "Design and Evaluation of a Compiler Algorithm for Prefetching," ASPLOS 1992}} | ||
+ | * {{feedbackdirectedprefetching.pdf| Srinath et al., "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers", HPCA 2007}} | ||
+ | * {{runaheadexecution.pdf|Mutlu et al., "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," HPCA 2003}} | ||
+ | |||
+ | ===== For Lecture 15 ===== | ||
+ | Same as previous lecture | ||
+ | |||
+ | ===== For Lecture 16 ===== | ||
+ | == Recommended Readings == | ||
+ | * {{statelesscontentdirectedprefetching.pdf|Cooksey et al., "A stateless, content-directed data prefetching mechanism," ASPLOS 2002}} | ||
+ | * {{bandwidthefficientprefetching.pdf|Ebrahimi et al., "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," HPCA 2009}} | ||
+ | * {{softwarecontrolledpreexecution.pdf|Luk, "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors," ISCA 2001}} | ||
+ | |||
+ | |||
+ | ===== Guest Lecture by Thomas Moscibroda ===== | ||
+ | == Recommended Readings == | ||
+ | * {{bless.pdf|Moscibroda and Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}} | ||
+ | * {{appawareprioritizationmechanismfornocs.pdf|Das et al., "Application-Aware Prioritization Mechanism for On-Chip Networks", MICRO 2009}} | ||
+ | * {{aergia.pdf|Das et al. "Aergia: Exploiting Packet-Latency Slack in On-Chip Networks", ISCA 2010}} | ||
+ | * {{nextgenerationnoc.pdf|Nychis et al., "Next Generation On-Chip Networks: What Kind of Congestion Control do we Need?", Hotnets 2010}} | ||
+ | |||
+ | |||
+ | ===== For Lecture 17 ===== | ||
+ | == Recommended Readings == | ||
+ | * {{coordinatedprefetchermanagement.pdf|Ebrahimi et al., "Coordinated Management of Multiple Prefetchers in Multi-Core Systems," MICRO 2009}} | ||
+ | |||
+ | ===== For Lecture 18 ===== | ||
+ | == Required Readings == | ||
+ | * {{utilitybasedcachepartitioning.pdf|Qureshi and Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," MICRO 2006}} | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{gaininginsightsintocachepartitioning.pdf|Lin et al., "Gaining Insights into Multi-Core Cache Partitioning:Bridging the Gap between Simulation and Real Systems," HPCA 2008}} | ||
+ | * {{adaptiveinsertionpolicies.pdf|Qureshi et al., "Adaptive Insertion Policies for High-Performance Caching," ISCA 2007}} | ||
+ | |||
+ | |||
+ | ===== For Lecture 19 ===== | ||
+ | == Required Readings == | ||
+ | * {{parbs.pdf|Mutlu and Moscibroda, "Parallelism-Aware Batch Scheduling:Enabling High-Performance and Fair Memory Controllers," IEEE Micro Top Picks 2009}} | ||
+ | * {{stalltimefairmemoryscheduling.pdf| Mutlu and Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," MICRO 2007}} | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{permutationbasedpageinterleaving.pdf|Zhang et al., "A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality," MICRO 2000}} | ||
+ | * {{prefetchawaredramcontrollers.pdf|Lee et al., "Prefetch-Aware DRAM Controllers," MICRO 2008}} | ||
+ | * {{memoryaccessscheduling.pdf|Rixner et al., "Memory Access Scheduling," ISCA 2000}} | ||
+ | |||
+ | ===== For Lecture 20 ===== | ||
+ | Same as previous lecture | ||
+ | |||
+ | ===== For Lecture 21 ===== | ||
+ | == Required Readings == | ||
+ | * {{evaluationoftracecachefetchmechanisms.pdf|Patel et al., "Evaluation of design options for the trace cache fetch mechanism," IEEE TC 1999}} | ||
+ | * {{complexityeffectivesuperscalar.pdf|Palacharla et al., "Complexity Effective Superscalar Processors," ISCA 1997}} | ||
+ | |||
+ | == Required Readings (old) == | ||
+ | * {{microarchitectureofsuperscalar.pdf|Smith and Sohi, "The Microarchitecture of Superscalar Processors," Proc IEEE 1995}} | ||
+ | * {{onpipeliningdynamicinstructionschedulinglogic.pdf|Stark, Brown, Patt, "On pipelining dynamic instruction scheduling logic," MICRO 2000}} | ||
+ | * {{Themicroarchitectureofthepentium4processor.pdf|Boggs et al., "The microarchitecture of the Pentium 4 processor," Intel Technology Journal, 2001}} | ||
+ | * {{21264microprocessor.pdf|Kessler, "The Alpha 21264 microprocessor," IEEE Micro, March-April 1999}} | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{tracecache.pdf|Rotenberg et al., "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," MICRO 1996}} | ||
+ | |||
+ | ===== For Lecture 21 ===== | ||
+ | Same as previous lecture | ||
+ | |||
+ | ===== For Lecture 22 ===== | ||
+ | Same as previous lecture | ||
+ | |||
+ | ===== For Lecture 23 ===== | ||
+ | Same as previous lecture | ||
+ | |||
+ | ===== For Lecture 24 ===== | ||
+ | == Required Readings == | ||
+ | * {{conbiningbranchpredictors.pdf|McFarling, "Combining Branch Predictors," DEC WRL TR, 1993}} | ||
+ | * {{increasingprocessorperformance.pdf|Carmean and Sprangle, "Increasing Processor Performance by Implementing Deeper Pipelines," ISCA 2002}} | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{analysisofcorrelationandpredictability.pdf|Evers et al., "An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work," ISCA 1998}} | ||
+ | * {{alternativeimplementationoftwolevelbp.pdf|Yeh and Patt, "Alternative Implementations of Two-Level Adaptive Branch Prediction," ISCA 1992}} | ||
+ | * {{availableilpforsuperscalar.pdf|Jouppi and Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," ASPLOS 1989}} | ||
+ | * {{divergemergeprocessors.pdf|Kim et al., "Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths," MICRO 2006}} | ||
+ | * {{dynamicbranchpredictionwithperceptrons.pdf|Jimenez and Lin, "Dynamic Branch Prediction with Perceptrons," HPCA 2001}} | ||
+ | |||
+ | ===== For Lecture 25 ===== | ||
+ | Same as previous lecture | ||
+ | |||
+ | ===== For Lecture 26 ===== | ||
+ | |||
+ | === Control Flow III === | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{wishbranches.pdf|Kim et al., "Wish Branches: Enabling Adaptive and Aggressive Predicated Execution," IEEE Micro Top Picks, Jan/Feb 2006}} | ||
+ | * {{divergemergeprocessors.pdf|Kim et al., "Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication," IEEE Micro Top Picks, Jan/Feb 2007}} | ||
+ | |||
+ | === Alternative Approaches to Concurrency === | ||
+ | == Required Readings == | ||
+ | * {{vliweli.pdf|Fisher, "Very Long Instruction Word architectures and the ELI-512," ISCA 1983}} | ||
+ | * {{introducingia64.pdf|Huck et al., "Introducing the IA-64 Architecture," IEEE Micro 2000}} | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{cray1computersystem.pdf|Russell, "The CRAY-1 computer system," CACM 1978}} | ||
+ | * {{ilpprocessing.pdf|Rau and Fisher, "Instruction-level parallel processing: history,overview, and perspective," Journal of Supercomputing, 1993}} | ||
+ | * {{instructionschedulingforilpprocessors.pdf|Faraboschi et al., "Instruction Scheduling for Instruction Level Parallel Processors," Proc. IEEE, Nov. 2001}} | ||
+ | |||
+ | ===== For Lecture 26 ===== | ||
+ | Same as previous lecture (Alternative Approaches to Concurrency) | ||
+ | |||
+ | ===== For Lecture 27 ===== | ||
+ | == Required Readings == | ||
+ | * {{nvidiatesla.pdf|Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro 2008}} | ||
+ | * {{cray1computersystem.pdf|Russell, "The CRAY-1 computer system," CACM 1978}} | ||
+ | |||
+ | == Recommended Readings == | ||
+ | * {{dynamicwarpformation.pdf|Fung et al., "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," MICRO 2007}} | ||
+ | * {{qilin.pdf|Luk et al., "Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping," MICRO 2009}} |