User Tools

Site Tools


buzzword

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
buzzword [2015/03/23 18:02]
rachata
buzzword [2015/04/01 18:17]
rachata
Line 874: Line 874:
       * Determine which command (across banks) should be sent to DRAM       * Determine which command (across banks) should be sent to DRAM
  
 +===== Lecture 22 (03/25 Wed.) =====
 +
 +
 +
 +  * Flash controller
 +  * Flash memory
 +  * Garbage collection in flash
 +  * Overhead in flash memory
 +    * Erase (off the critical path, but takes a long time)
 +  * Different types of DRAM
 +  * DRAM design choices
 +    * Cost/​density/​latency/​BW/​Yield
 +  * Sense Amplifier
 +    * How do they work
 +  * Dual data rate
 +  * Subarray
 +  * Rowclone
 +    * Moving bulk of data from one row to others
 +    * Lower latency and BW when performing copies/​zeroes out the data
 +  * TL-DRAM
 +    * Far segment
 +    * Near segment
 +    * What causes the long latency
 +    * Benefit of TL-DRAM
 +      * TL-DRAM vs. DRAM cache (adding a small cache in DRAM)
 +  * List of commands to read/write data into DRAM
 +    * Activate -> read/write -> precharge
 +    * Activate moves data into the row buffer
 +    * Precharge prepare the bank for the next access
 +  * Row buffer hit
 +  * Row buffer conflict
 +  * Scheduling memory requests to lower row conflicts
 +  * Burst mode of DRAM
 +    * Prefetch 32-bits from an 8-bit interface if DRAM needs to read 32 bits
 +  * Address mapping
 +    * Row interleaved
 +    * Cache block interleaved
 +  * Memory controller
 +    * Sending DRAM commands
 +    * Periodically send commands to refresh DRAM cells
 +    * Ensure correctness and data integrity
 +    * Where to place the memory controller
 +      * On CPU chip vs. at the main memory
 +        * Higher BW on-chip
 +    * Determine the order of requests that will be serviced in DRAM
 +      * Request queues that hold requests
 +      * Send requests whenever the request can be sent to the bank
 +      * Determine which command (across banks) should be sent to DRAM
 +  * Priority of demand vs. prefetch requests
 +  * Memory scheduling policies
 +    * FCFS
 +    * FR-FCFS
 +      * Try to maximize row buffer hit rate
 +      * Capped FR-FCFS: FR-FCFS with a timeout
 +      * Usually this is done in a command level (read/write commands and precharge/​activate commands)
 +    * PAR-BS
 +      * Key benefits
 +      * stall time
 +      * shortest job first
 +    * STFM
 +    * ATLAS
 +    * TCM
 +      * Key benefits
 +      * Configurability
 +      * Fairness + performance at the same time
 +      * Robuestness isuees
 +  * Open row policy
 +  * Closed row policy
 +  * QoS
 +    * QoS issues in memory scheduling
 +    * Fairness
 +    * Performance guarantee
 +
 +
 +===== Lecture 23 (03/27 Fri.) =====
 +
 +  * Different ways to control interference in DRAM
 +    * Partitioning of resource
 +      * Channel partitioning:​ map applications that interfere with each other in a different channel
 +        * Keep track of application'​s characteristics
 +        * Dedicate a channel might waste the bandwidth
 +        * Need OS support to determine the channel bits
 +    * Source throttling
 +      * A controller throttle the core depends on the performance target
 +      * Example: Fairness via source throttling
 +        * Detect unfairness and throttle application that is interfering
 +        * How do you estimate slowdown?
 +        * Threshold based solution: hard to configure
 +    * App/thread scheduling
 +      * Critical threads usually stall the progress
 +    * Designing DRAM controller
 +      * Has to handle the normal DRAM operations
 +        * Read/​write/​refresh/​all the timing constraints
 +      * Keep track of resources
 +      * Assign priorities to different requests
 +      * Manage requests to banks
 +    * Self-optimizing controller
 +      * Use machine learning to improve DRAM controller
 +    * A-DRM
 +      * Architecture aware DRAM
 +  * Multithread
 +    * synchronization
 +    * Pipeline programs
 +      * Producer consumer model
 +    * Critical path
 +    * Limiter threads
 +    * Prioritization between threads
 +  * Different power mode in DRAM
 +  * DRAM Refresh
 +    * Why does DRAM has to refresh every 64ms
 +    * Banks are unavailable during refresh
 +      * LPDDR mitigate this by using a per-bank refresh
 +    * Has to spend longer time with bigger DRAM
 +    * Distributed refresh: stagger refresh every 64 ms in a distributed manner
 +      * As oppose to burst refresh (long pause time)
 +  * RAIDR: Reduce DRAM refresh by profiling and binning
 +    * Some row do not have to be refresh very frequently
 +      * Profile the row
 +        * High temperature changes the retention time: need online profiling
 +  * Bloom filter
 +    * Represent set membership
 +    * Approximated
 +    * Can contain false positive
 +      * Better/more hash function helps eliminate this
   ​   ​
 +===== Lecture 24 (03/30 Mon.) =====
 +
 +  * Simulation
 +    * Drawbacks of RTL simulations
 +      * Time consuming
 +      * Complex to develop
 +      * Hard to perform design explorations
 +    * Explore the design space quickly
 +    * Match the behavior of existing systems
 +    * Tradeoffs: speed, accuracy, flexibility
 +    * High-level simulation vs. detailed simulation
 +      * High-level simulation is faster, but lower accuracy
 +  * Controllers that works on multiple types of cores
 +    * Design problems: how to find a good scheduling policy on its own?
 +    * Self-optimizing memory controller: using machine learning
 +      * Can adapt to the applications
 +      * The complexity is very high      ​
 +  * Tolerate latency can be costly
 +    * Instruction window is complex
 +      * Benefit also diminishes
 +    * Designing the buffers can be complex
 +    * A simpler way to tolerate out of order is desirable
 +  * Different sources that cause the core to stall in OoO
 +    * Cache miss
 +    * Note that stall happens if the inst. window is full
 +  * Scaling instruction window size is hard
 +    * It is better (less complex) to make the windows more efficient
 +  * Runahead execution
 +    * Try to optain MLP w/o increasing instruction windows
 +    * Runahead (i.e. execute ahead) when there is a long memory instruction
 +      * Long memory instruction stall processor for a while anyways, so it's better to make use out of it
 +      * Execute future instruction to generate accurate prefetches
 +      * Allow future data to be in the cache
 +    * How to support runahead execution?
 +      * Need a way to checkpoing the state when entering runahead mode
 +      * How to make executing in the wrong path useful?
 +      * Need runahead cache to handle load/store in Runahead mode (since they are speculative)
 +
 +
 +===== Lecture 25 (4/1 Wed.) =====
 +
 +  * More Runahead executions
 +    * How to support runahead execution?
 +      * Need a way to checkpoing the state when entering runahead mode
 +      * How to make executing in the wrong path useful?
 +      * Need runahead cache to handle load/store in Runahead mode (since they are speculative)
 +    * Cost and benefit of runahead execution (slide number 27)
 +    * Runahead can have inefficiency
 +      * Runahead period that are useless
 +        * Get rid of useless inefficient period
 +    * What if there is a dependent cache miss
 +      * Cannot be paralellized in a vanilla runahead
 +      * Can predict the value of the dependent load
 +        * How to predict the address of the load
 +          * Delta value information
 +          * Stride predictor
 +          * AVD prediction
 +  * Questions regarding prefetching
 +    * What to prefetch
 +    * When to prefetch
 +    * how do we prefetch
 +    * where to prefetch from
 +  * Prefetching can cause thrasing (evict a useful block)
 +  * Prefetching can also be useless (not being used)
 +    * Need to be efficient
 +  * Can cause memory bandwidth problem in GPU
 +  * Prefetch the whole block, more than one block, or subblock?
 +    * Each one of them has pros and cons
 +    * Big prefetch is more likely to waste bandwidth
 +    * Commonly done in a cache block granularity
 +  * Prefetch accuracy: fraction of useful prefetches out of all the prefetches
 +  * Prefetcher usually predict based on
 +    * Past knowledge
 +    * Compiler hints
 +  * Prefetcher has to prefetch at the right time
 +    * Prefetch that is too early might get evicted
 +      * It might also evict other useful data
 +    * Prefetch too late does not hide the whole memory latency
 +  * Previous prefetches at the same PC can be used as the history
 +  * Previous demand requests also is a good information to use for prefetches
 +  * Prefetch buffer
 +    * Place the prefetch data to avoid thrashing
 +      * Can treat demand/​prefetch requests separately
 +      * More complex
 +  * Generally, demand block is more important
 +    * This means eviction should prefer prefetch block as oppose to demand block
 +  * Tradeoffs between where do we place the prefetcher
 +    * Look at L1 hits and misses
 +    * Look at L1 misses only
 +    * Look at L2 misses
 +    * Different access pattern affect accuracy
 +      * Tradeoffs between handling more requests (seeing L1 hits and misses) and less visibility (only see L2 miss)
 +  * Software vs. hardware vs. execution based prefetching
 +    * Software: ISA previde prefetch instructions,​ software utilize it
 +      * What information are useful
 +      * How to make sure the prefetch is timely
 +      * What if you have a pointer based structure
 +        * Not easy to prefetch pointer chasing (because in many case the work between prefetches is short, so you cannot predict the next one timely enough)
 +          * Can be solved by hinting the nextnext and/or nextnextnext address
 +    * Hardware: Identify the pattern and prefetch
 +    * Execution driven: Oppotunistically try to prefetch (runahead, dual-core execution)
 +  * Stride prefetcher
 +    * Predict strides, which is common in many programs
 +    * Cache block based or instruction based
 +  * Stream buffer design
 +    * Buffer the stream of accesses (next address) ​
 +    * Use the information to prefetch
 +  * What affect prefetcher performance
 +    * Prefetch distance
 +      * How far ahead should we prefetch
 +    * Prefetch degree
 +      * How many prefetches do we prefetch
 +  * Prefetcher performance
 +    * Coverage
 +      * Out of the demand requests, how many are actually from the prefetch request
 +    * Accuracy
 +      * Out of all the prefetch requests, how many are actually getting used
 +    * Timeliness
 +      * How much memory latency can we hide from the prefetch requests
 +    * Cache pullition
 +      * How much did the prefetcher cause misses in the demand misses?
 +        * Hard to quantify
buzzword.txt ยท Last modified: 2015/04/27 18:20 by rachata