Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/03/17 02:29]
kevincha
+++ buzzword [2015/03/30 23:15]
rachata
@@ Line 799: / Line 799: @@
   * Page coloring
   * Interaction between cache and TLB
+===== Lecture 21 (03/23 Mon.) =====
+  * DRAM scaling problem
+  * Demands/trends affecting the main memory
+    * More capacity
+    * Low energy
+    * More bandwidth
+    * QoS
+  * ECC in DRAM
+  * Multi-porting
+    * Virtual multi-porting
+      * Time-share the port, not too scalable but cheap
+    * True multiporting
+  * Multiple cache copies
+  * Alignment
+  * Banking
+    * Can have bank conflict
+    * Extra interconnects across banks
+    * Address mapping can mitigate bank conflict
+    * Common in main memory (note that regFile in GPU is also banked, but mainly for the pupose of reducing complexity)
+  * Bank mapping
+    * How to avoid bank conflicts?
+  * Channel mapping
+    * Address mapping to minimize bank conflict
+    * Page coloring
+      * Virtual to physical mapping that can help reducing conflicts
+  * Accessing DRAM
+    * Row bits
+    * Column bits
+    * Addressibility
+    * DRAM has its own clock
+    * Sense amplifier
+    * Bit lines
+    * Word lines
+  * DRAM (2T) vs. SRAM (6T)
+    * Cost
+    * Latency
+  * Interleaving in DRAM
+    * Effects from address mapping on memory interleaving
+    * Effects from memory access patterns from the program on interleaving
+  * DRAM Bank
+    * To minimize the cost of interleaving (Shared the data bus and the command bus)
+  * DRAM Rank
+    * Minimize the cost of the chip (a bundle of chips operated together)
+  * DRAM Channel
+    * An interface to DRAM, each with its own ranks/banks
+  * DRAM Chip
+  * DIMM
+    * More DIMM adds the interconnect complexity
+  * List of commands to read/write data into DRAM
+    * Activate -> read/write -> precharge
+    * Activate moves data into the row buffer
+    * Precharge prepare the bank for the next access
+  * Row buffer hit
+  * Row buffer conflict
+  * Scheduling memory requests to lower row conflicts
+  * Burst mode of DRAM
+    * Prefetch 32-bits from an 8-bit interface if DRAM needs to read 32 bits
+  * Address mapping
+    * Row interleaved
+    * Cache block interleaved
+  * Memory controller
+    * Sending DRAM commands
+    * Periodically send commands to refresh DRAM cells
+    * Ensure correctness and data integrity
+    * Where to place the memory controller
+      * On CPU chip vs. at the main memory
+        * Higher BW on-chip
+    * Determine the order of requests that will be serviced in DRAM
+      * Request queues that hold requests
+      * Send requests whenever the request can be sent to the bank
+      * Determine which command (across banks) should be sent to DRAM
+===== Lecture 22 (03/25 Wed.) =====
+  * Flash controller
+  * Flash memory
+  * Garbage collection in flash
+  * Overhead in flash memory
+    * Erase (off the critical path, but takes a long time)
+  * Different types of DRAM
+  * DRAM design choices
+    * Cost/density/latency/BW/Yield
+  * Sense Amplifier
+    * How do they work
+  * Dual data rate
+  * Subarray
+  * Rowclone
+    * Moving bulk of data from one row to others
+    * Lower latency and BW when performing copies/zeroes out the data
+  * TL-DRAM
+    * Far segment
+    * Near segment
+    * What causes the long latency
+    * Benefit of TL-DRAM
+      * TL-DRAM vs. DRAM cache (adding a small cache in DRAM)
+  * List of commands to read/write data into DRAM
+    * Activate -> read/write -> precharge
+    * Activate moves data into the row buffer
+    * Precharge prepare the bank for the next access
+  * Row buffer hit
+  * Row buffer conflict
+  * Scheduling memory requests to lower row conflicts
+  * Burst mode of DRAM
+    * Prefetch 32-bits from an 8-bit interface if DRAM needs to read 32 bits
+  * Address mapping
+    * Row interleaved
+    * Cache block interleaved
+  * Memory controller
+    * Sending DRAM commands
+    * Periodically send commands to refresh DRAM cells
+    * Ensure correctness and data integrity
+    * Where to place the memory controller
+      * On CPU chip vs. at the main memory
+        * Higher BW on-chip
+    * Determine the order of requests that will be serviced in DRAM
+      * Request queues that hold requests
+      * Send requests whenever the request can be sent to the bank
+      * Determine which command (across banks) should be sent to DRAM
+  * Priority of demand vs. prefetch requests
+  * Memory scheduling policies
+    * FCFS
+    * FR-FCFS
+      * Try to maximize row buffer hit rate
+      * Capped FR-FCFS: FR-FCFS with a timeout
+      * Usually this is done in a command level (read/write commands and precharge/activate commands)
+    * PAR-BS
+      * Key benefits
+      * stall time
+      * shortest job first
+    * STFM
+    * ATLAS
+    * TCM
+      * Key benefits
+      * Configurability
+      * Fairness + performance at the same time
+      * Robuestness isuees
+  * Open row policy
+  * Closed row policy
+  * QoS
+    * QoS issues in memory scheduling
+    * Fairness
+    * Performance guarantee
+===== Lecture 23 (03/27 Fri.) =====
+  * Different ways to control interference in DRAM
+    * Partitioning of resource
+      * Channel partitioning: map applications that interfere with each other in a different channel
+        * Keep track of application's characteristics
+        * Dedicate a channel might waste the bandwidth
+        * Need OS support to determine the channel bits
+    * Source throttling
+      * A controller throttle the core depends on the performance target
+      * Example: Fairness via source throttling
+        * Detect unfairness and throttle application that is interfering
+        * How do you estimate slowdown?
+        * Threshold based solution: hard to configure
+    * App/thread scheduling
+      * Critical threads usually stall the progress
+    * Designing DRAM controller
+      * Has to handle the normal DRAM operations
+        * Read/write/refresh/all the timing constraints
+      * Keep track of resources
+      * Assign priorities to different requests
+      * Manage requests to banks
+    * Self-optimizing controller
+      * Use machine learning to improve DRAM controller
+    * A-DRM
+      * Architecture aware DRAM
+  * Multithread
+    * synchronization
+    * Pipeline programs
+      * Producer consumer model
+    * Critical path
+    * Limiter threads
+    * Prioritization between threads
+  * Different power mode in DRAM
+  * DRAM Refresh
+    * Why does DRAM has to refresh every 64ms
+    * Banks are unavailable during refresh
+      * LPDDR mitigate this by using a per-bank refresh
+    * Has to spend longer time with bigger DRAM
+    * Distributed refresh: stagger refresh every 64 ms in a distributed manner
+      * As oppose to burst refresh (long pause time)
+  * RAIDR: Reduce DRAM refresh by profiling and binning
+    * Some row do not have to be refresh very frequently
+      * Profile the row
+        * High temperature changes the retention time: need online profiling
+  * Bloom filter
+    * Represent set membership
+    * Approximated
+    * Can contain false positive
+      * Better/more hash function helps eliminate this
+===== Lecture 24 (03/30 Mon.) =====
+  * Simulation
+    * Drawbacks of RTL simulations
+      * Time consuming
+      * Complex to develop
+      * Hard to perform design explorations
+    * Explore the design space quickly
+    * Match the behavior of existing systems
+    * Tradeoffs: speed, accuracy, flexibility
+    * High-level simulation vs. detailed simulation
+      * High-level simulation is faster, but lower accuracy
+  * Controllers that works on multiple types of cores
+    * Design problems: how to find a good scheduling policy on its own?
+    * Self-optimizing memory controller: using machine learning
+      * Can adapt to the applications
+      * The complexity is very high
+  * Tolerate latency can be costly
+    * Instruction window is complex
+      * Benefit also diminishes
+    * Designing the buffers can be complex
+    * A simpler way to tolerate out of order is desirable
+  * Different sources that cause the core to stall in OoO
+    * Cache miss
+    * Note that stall happens if the inst. window is full
+  * Scaling instruction window size is hard
+    * It is better (less complex) to make the windows more efficient
+  * Runahead execution
+    * Try to optain MLP w/o increasing instruction windows
+    * Runahead (i.e. execute ahead) when there is a long memory instruction
+      * Long memory instruction stall processor for a while anyways, so it's better to make use out of it
+      * Execute future instruction to generate accurate prefetches
+      * Allow future data to be in the cache
+    * How to support runahead execution?
+      * Need a way to checkpoing the state when entering runahead mode
+      * How to make executing in the wrong path useful?
+      * Need runahead cache to handle load/store in Runahead mode (since they are speculative)

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools