Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/03/27 18:24]
rachata
+++ buzzword [2015/04/03 18:20]
rachata
@@ Line 998: / Line 998: @@
     * Can contain false positive
       * Better/more hash function helps eliminate this
+===== Lecture 24 (03/30 Mon.) =====
+  * Simulation
+    * Drawbacks of RTL simulations
+      * Time consuming
+      * Complex to develop
+      * Hard to perform design explorations
+    * Explore the design space quickly
+    * Match the behavior of existing systems
+    * Tradeoffs: speed, accuracy, flexibility
+    * High-level simulation vs. detailed simulation
+      * High-level simulation is faster, but lower accuracy
+  * Controllers that works on multiple types of cores
+    * Design problems: how to find a good scheduling policy on its own?
+    * Self-optimizing memory controller: using machine learning
+      * Can adapt to the applications
+      * The complexity is very high
+  * Tolerate latency can be costly
+    * Instruction window is complex
+      * Benefit also diminishes
+    * Designing the buffers can be complex
+    * A simpler way to tolerate out of order is desirable
+  * Different sources that cause the core to stall in OoO
+    * Cache miss
+    * Note that stall happens if the inst. window is full
+  * Scaling instruction window size is hard
+    * It is better (less complex) to make the windows more efficient
+  * Runahead execution
+    * Try to optain MLP w/o increasing instruction windows
+    * Runahead (i.e. execute ahead) when there is a long memory instruction
+      * Long memory instruction stall processor for a while anyways, so it's better to make use out of it
+      * Execute future instruction to generate accurate prefetches
+      * Allow future data to be in the cache
+    * How to support runahead execution?
+      * Need a way to checkpoing the state when entering runahead mode
+      * How to make executing in the wrong path useful?
+      * Need runahead cache to handle load/store in Runahead mode (since they are speculative)
+===== Lecture 25 (4/1 Wed.) =====
+  * More Runahead executions
+    * How to support runahead execution?
+      * Need a way to checkpoing the state when entering runahead mode
+      * How to make executing in the wrong path useful?
+      * Need runahead cache to handle load/store in Runahead mode (since they are speculative)
+    * Cost and benefit of runahead execution (slide number 27)
+    * Runahead can have inefficiency
+      * Runahead period that are useless
+        * Get rid of useless inefficient period
+    * What if there is a dependent cache miss
+      * Cannot be paralellized in a vanilla runahead
+      * Can predict the value of the dependent load
+        * How to predict the address of the load
+          * Delta value information
+          * Stride predictor
+          * AVD prediction
+  * Questions regarding prefetching
+    * What to prefetch
+    * When to prefetch
+    * how do we prefetch
+    * where to prefetch from
+  * Prefetching can cause thrasing (evict a useful block)
+  * Prefetching can also be useless (not being used)
+    * Need to be efficient
+  * Can cause memory bandwidth problem in GPU
+  * Prefetch the whole block, more than one block, or subblock?
+    * Each one of them has pros and cons
+    * Big prefetch is more likely to waste bandwidth
+    * Commonly done in a cache block granularity
+  * Prefetch accuracy: fraction of useful prefetches out of all the prefetches
+  * Prefetcher usually predict based on
+    * Past knowledge
+    * Compiler hints
+  * Prefetcher has to prefetch at the right time
+    * Prefetch that is too early might get evicted
+      * It might also evict other useful data
+    * Prefetch too late does not hide the whole memory latency
+  * Previous prefetches at the same PC can be used as the history
+  * Previous demand requests also is a good information to use for prefetches
+  * Prefetch buffer
+    * Place the prefetch data to avoid thrashing
+      * Can treat demand/prefetch requests separately
+      * More complex
+  * Generally, demand block is more important
+    * This means eviction should prefer prefetch block as oppose to demand block
+  * Tradeoffs between where do we place the prefetcher
+    * Look at L1 hits and misses
+    * Look at L1 misses only
+    * Look at L2 misses
+    * Different access pattern affect accuracy
+      * Tradeoffs between handling more requests (seeing L1 hits and misses) and less visibility (only see L2 miss)
+  * Software vs. hardware vs. execution based prefetching
+    * Software: ISA previde prefetch instructions, software utilize it
+      * What information are useful
+      * How to make sure the prefetch is timely
+      * What if you have a pointer based structure
+        * Not easy to prefetch pointer chasing (because in many case the work between prefetches is short, so you cannot predict the next one timely enough)
+          * Can be solved by hinting the nextnext and/or nextnextnext address
+    * Hardware: Identify the pattern and prefetch
+    * Execution driven: Oppotunistically try to prefetch (runahead, dual-core execution)
+  * Stride prefetcher
+    * Predict strides, which is common in many programs
+    * Cache block based or instruction based
+  * Stream buffer design
+    * Buffer the stream of accesses (next address)
+    * Use the information to prefetch
+  * What affect prefetcher performance
+    * Prefetch distance
+      * How far ahead should we prefetch
+    * Prefetch degree
+      * How many prefetches do we prefetch
+  * Prefetcher performance
+    * Coverage
+      * Out of the demand requests, how many are actually from the prefetch request
+    * Accuracy
+      * Out of all the prefetch requests, how many are actually getting used
+    * Timeliness
+      * How much memory latency can we hide from the prefetch requests
+    * Cache pullition
+      * How much did the prefetcher cause misses in the demand misses?
+        * Hard to quantify
+===== Lecture 26 (4/3 Fri.) =====
+  * Feedback directed prefetcher
+    * Use the result of the prefetcher as a feedback to the prefetcher
+      * with accuracy, timeliness, polluting information
+  * Markov prefetcher
+    * Prefetch based on the previous history
+    * Use markov model to predict
+    * Pros: Can cover arbitary pattern (easy for link list traversal or trees)
+    * Downside: High cost, cannot help with compulsory misses (no history)
+  * Content directed prefetching
+    * Indentify the content in memory for pointers (which is used as the address to prefetch
+    * Not very efficient (hard to figure out which block is the pointer)
+      * Software can give hints
+  * Correlation table
+    * Address correlation
+  * Execution based prefetcher
+    * Helper thread/speculative thread
+      * Use another thread to pre-execute a program
+    * Can be a software based or hardware based
+    * Discover misses before the main program (to prefetch data in a timely manner)
+    * How do you construct the helper thread
+    * Preexecute instruction (one example of how to initialize a speculative thread), slide 9
+    * Thread-based pre-execution
+  * Error tolerance
+  * Solution to errors
+    * Tolerate errors
+      * New interface, new design
+    * Eliminate or minimize errors
+      * New technology, system-wide rethinking
+    * Embrace errors
+      * Map data that can tolerate errors to error-prone area
+  * Hybrid memory systesm
+    * Combining multiple memory technology together
+  * What can emerging technology help?
+    * Scalability
+    * Lower the cost
+    * Energy efficiency
+  * Possible solutions to the scaling problem
+    * Less leakage DRAM
+    * Heterogeneous DRAM (TL-DRAM, etc.)
+    * Add more functionality to DRAM
+    * Denser design (3D stack)
+    * Different technology
+      * NVM
+  * Charge vs. resistice memory
+    * How data is written?
+    * How to read the data?
+  * Non volatile memory
+    * Resistive memory
+      * PCM
+        * Inject current to change the phase
+        * Scales better than DRAM
+          * Multiple bits per cell
+            * Wider resistence range
+        * No refresh is needed
+        * Downside: Latency and write endurance
+      * STT-MRAM
+        * Inject current to change the polarity
+      * Memristor
+        * Inject current to change the structure
+    * Pros and cons between different technologies
+    * Persistency - data stay there even without power
+      * Unified memory and storage management (persistent data structure) - Single level store
+        * Improve energy and performance
+        * Simplify programming model
+  * Different design options for DRAM + NVM
+    * DRAM as a cache
+    * Place some data in DRAM and other in PCM
+      * Based on the characteristics
+        * Frequently accessed data that need lower write latency in DRAM

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools