This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2014/03/26 18:17] rachata |
buzzword [2014/04/02 18:13] rachata |
||
---|---|---|---|
Line 877: | Line 877: | ||
| | ||
| | ||
+ | ===== Lecture 23 (3/28 Fri.) ===== | ||
+ | |||
+ | * DRAM design choices | ||
+ | * Cost/density/latency/BW/Yield | ||
+ | * Sense Amplifier | ||
+ | * How do they work | ||
+ | * Dual data rate | ||
+ | * Subarray | ||
+ | * Rowclone | ||
+ | * Moving bulk of data from one row to others | ||
+ | * Lower latency and BW when performing copies/zeroes out the data | ||
+ | * TL-DRAM | ||
+ | * Far segment | ||
+ | * Near segment | ||
+ | * What causes the long latency | ||
+ | * Benefit of TL-DRAM | ||
+ | * TL-DRAM vs. DRAM cache (adding a small cache in DRAM) | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | ===== Lecture 24 (3/31 Mon.) ===== | ||
+ | | ||
+ | |||
+ | * Memory controller | ||
+ | * Different commands | ||
+ | * Memory scheduler | ||
+ | * Determine the order of requests to be issued to DRAM | ||
+ | * Age/hit-miss status/types(load/store/prefetch/from GPU/from CPU)/criticality | ||
+ | * Row buffer | ||
+ | * hit/conflict | ||
+ | * open/closed row | ||
+ | * Open row policy | ||
+ | * Closed row policy | ||
+ | * Tradeoffs between open and closed row policy | ||
+ | * What if the programs has high row buffer locality: open row might benefit more | ||
+ | * Closed row will service misses request faster | ||
+ | * Bank conflict | ||
+ | * Interference from different applications/threads | ||
+ | * Differnt programs/processes/threads interfere with each other | ||
+ | * introduce more row buffer/bank conflicts | ||
+ | * Memory schedule has to manage these interference | ||
+ | * Memory hog problems | ||
+ | * Interference in the data/command bus | ||
+ | * FR-FCFS | ||
+ | * Why does FR-FCFS make sense? | ||
+ | * Row buffer has lower lantecy | ||
+ | * Issues with FR-FCFS | ||
+ | * Unfairness | ||
+ | * STFM | ||
+ | * Fairness issue in memory scheduling | ||
+ | * How does STFM calculate the fairness and slowdown | ||
+ | * How to estimate slowdown time when it is runing alone | ||
+ | * Definition of fairness (based on STFM, different papers/areas define fairness differently) | ||
+ | * PAR-BS | ||
+ | * Parallelism in programs | ||
+ | * Intereference across banks | ||
+ | * How to form a batch | ||
+ | * How to determine ranking between batches/within a batch | ||
+ | | ||
+ | |||
+ | |||
+ | ===== Lecture 25 (2/2 Wed.) ===== | ||
+ | |||
+ | |||
+ | |||
+ | * Latency sensitivity | ||
+ | * Performance drops a lot when the memory request latency is long | ||
+ | * TCM | ||
+ | * Tradeoff between throughput and fairness | ||
+ | * Latency sensitive cluster (non-intensive cluster) | ||
+ | * Ranking based on memory intensity | ||
+ | * Bandwidth intensive cluster | ||
+ | * Round robin within the cluster | ||
+ | * Generally latency sensitive cluster has more priority | ||
+ | * Provide robust fairness vs. throughput | ||
+ | * Complexity of TCM? | ||
+ | * Different ways to control interference in DRAM | ||
+ | * Partitioning of resource | ||
+ | * Channel partitioning: map applications that interfere with each other in a different channel | ||
+ | * Keep track of application's characteristics | ||
+ | * Dedicate a channel might waste the bandwidth | ||
+ | * Need OS support to determine the channel bits | ||
+ | * Source throttling | ||
+ | * A controller throttle the core depends on the performance target | ||
+ | * Example: Fairness via source throttling | ||
+ | * Detect unfairness and throttle application that is interfering | ||
+ | * How do you estimate slowdown? | ||
+ | * Threshold based solution: hard to configure | ||
+ | * App/thread scheduling | ||
+ | * Critical threads usually stall the progress | ||
+ | * Designing DRAM controller | ||
+ | * Has to handle the normal DRAM operations | ||
+ | * Read/write/refresh/all the timing constraints | ||
+ | * Keep track of resources | ||
+ | * Assign priorities to different requests | ||
+ | * Manage requests to banks | ||
+ | * Self-optimizing controller | ||
+ | * Use machine learning to improve DRAM controller | ||
+ | * DRAM Refresh | ||
+ | * Why does DRAM has to refresh every 64ms | ||
+ | * Banks are unavailable during refresh | ||
+ | * LPDDR mitigate this by using a per-bank refresh | ||
+ | * Has to spend longer time with bigger DRAM | ||
+ | * Distributed refresh: stagger refresh every 64 ms in a distributed manner | ||
+ | * As oppose to burst refresh (long pause time) | ||
+ | * RAIDR: Reduce DRAM refresh by profiling and binning | ||
+ | * Some row do not have to be refresh very frequently | ||
+ | * Profile the row | ||
+ | * High temperature changes the retention time: need online profiling | ||
+ | * Bloom filter | ||
+ | * Represent set membership | ||
+ | * Approximated | ||
+ | * Can contain false positive | ||
+ | * Better/more hash function helps eliminate this | ||
| |