Differences

This shows you the differences between two versions of the page.

--- buzzword [2014/03/03 18:16]
rachata
+++ buzzword [2014/03/26 18:17]
rachata
@@ Line 672: / Line 672: @@
   * Intel IA-64
     * Static instruction scheduling and VLIW
+===== Lecture 19 (3/19 Wed.) =====
+  * Ideal cache
+    * More capacity
+    * Fast
+    * Cheap
+    * High bandwidth
+  * DRAM cell
+    * Cheap
+    * Sense the purturbation through sense amplifier
+    * Slow and leaky
+  * SRAM cell (Cross coupled inverter)
+    * Expensice
+    * Fast (easier to sense the value in the cell)
+  * Memory bank
+    * Read access sequence
+      * DRAM: Activate -> Read -> Precharge (if needed)
+    * What dominate the access laatency for DRAM and SRAM
+  * Scaling issue
+    * Hard to scale the scale to be small
+  * Memory hierarchy
+    * Prefetching
+    * Caching
+  * Spatial and temporal locality
+    * Cache can exploit these
+    * Recently used data is likely to be accessed
+    * Nearby data is likely to be accessed
+  * Caching in a pipeline design
+  * Cache management
+    * Manual
+      * Data movement is managed manually
+        * Embedded processor
+        * GPU scratchpad
+    * Automatic
+      * HW manage data movements
+  * Latency analysis
+    * Based on the hit and miss status, next level access time (if miss), and the current level access time
+  * Cache basics
+    * Set/block (line)/Placement/replacement/direct mapped vs. associative cache/etc.
+  * Cache access
+    * How to access tag and data (in parallel vs serially)
+    * How do tag and index get used?
+    * Modern processors perform serial access for higher level cache (L3 for example) to save power
+  * Cost and benefit of having more associativity
+    * Given the associativity, which block should be replace if it is full
+    * Replacement poligy
+      * Random
+      * Least recently used (LRU)
+      * Least frequently used
+      * Least costly to refetch
+      * etc.
+  * How to implement LRU
+    * How to keep track of access ordering
+      * Complexity increases rapidly
+    * Approximate LRU
+      * Victim and next Victim policy
+===== Lecture 20 (3/21 Fri.) =====
+  * Set thrashing
+    * Working set is bigger than the associativity
+  * Belady's OPT
+    * Is this optimal?
+    * Complexity?
+  * Similarity between cache and page table
+    * Number of blocks vs pages
+    * Time to find the block/page to replace
+  * Handling writes
+    * Write through
+      * Need a modified bit to make sure accesses to data got the updated data
+    * Write back
+      * Simpler, no consistency issues
+  * Sectored cache
+    * Use subblock
+      * lower bandwidth
+      * more complex
+  * Instruction vs data cache
+    * Where to place instructions
+      * Unified vs. separated
+    * In the first level cache
+  * Cache access
+    * First level access
+    * Second level access
+      * When to start the second level access
+        * Performance vs. energy
+  * Address translation
+  * Homonym and Synonyms
+    * Homonym: Same VA but maps to different PA
+      * With multiple processes
+    * Synonyms: Multiple VAs map to the same PA
+      * Shared libraries, shared data, copy-on-write
+      * I/O
+    * Can these create problems when we have the cache
+    * How to eliminate these problems?
+      * Page coloring
+  * Interaction between cache and TLB
+    * Virtually indexed vs. physically indexed
+    * Virtually tagged vs. physically tagged
+    * Virtually indexed physically tagged
+  * Virtual memory in DRAM
+    * Control where data is mapped to in channel/rank/bank
+      * More parallelism
+      * Reduce interference
+===== Lecture 21 (3/24 Mon.) =====
+  * Different parameters that affect cache miss
+  * Thrashing
+  * Different types of cache misses
+    * Compulsory misses
+      * Can mitigate with prefetches
+    * Capacity misses
+      * More assoc
+      * Victim cache
+    * Conflict misses
+      * Hashing
+  * Large block vs. small block
+  * Subblocks
+  * Victim cache
+    * Small, but fully assoc. cache behind the actual cache
+    * Cached misses cache block
+    * Prevent ping-ponging
+  * Pseudo associativity
+    * Simpler way to implement associative cache
+  * Skewed assoc. cache
+    * Different hashing functions for each way
+  * Restructure data access pattern
+    * Order of loop traversal
+    * Blocking
+  * Memory level parallelism
+    * Cost per miss of a parallel cache miss is less costly compared to serial misses
+  * MSHR
+    * Keep track of pending cache
+      * Think of this as the load/store buffer-ish for cache
+    * What information goes into the MSHR?
+    * When do you access the MSHR?
+===== Lecture 22 (3/26 Wed.) =====
+  * Multi-porting
+    * Virtual multi-porting
+      * Time-share the port, not too scalable but cheap
+    * True multiporting
+  * Multiple cache copies
+  * Banking
+    * Can have bank conflict
+    * Extra interconnects across banks
+    * Address mapping can mitigate bank conflict
+    * Common in main memory (note that regFile in GPU is also banked, but mainly for the pupose of reducing complexity)
+  * Accessing DRAM
+    * Row bits
+    * Column bits
+    * Addressibility
+    * DRAM has its own clock
+  * DRAM (2T) vs. SRAM (6T)
+    * Cost
+    * Latency
+  * Interleaving in DRAM
+    * Effects from address mapping on memory interleaving
+    * Effects from memory access patterns from the program on interleaving
+  * DRAM Bank
+    * To minimize the cost of interleaving (Shared the data bus and the command bus)
+  * DRAM Rank
+    * Minimize the cost of the chip (a bundle of chips operated together)
+  * DRAM Channel
+    * An interface to DRAM, each with its own ranks/banks
+  * DIMM
+    * More DIMM adds the interconnect complexity
+  * List of commands to read/write data into DRAM
+    * Activate -> read/write -> precharge
+    * Activate moves data into the row buffer
+    * Precharge prepare the bank for the next access
+  * Row buffer hit
+  * Row buffer conflict
+  * Scheduling memory requests to lower row conflicts
+  * Burst mode of DRAM
+    * Prefetch 32-bits from an 8-bit interface if DRAM needs to read 32 bits
+  * Address mapping
+    * Row interleaved
+    * Cache block interleaved
+  * Memory controller
+    * Sending DRAM commands
+    * Periodically send commands to refresh DRAM cells
+    * Ensure correctness and data integrity
+    * Where to place the memory controller
+      * On CPU chip vs. at the main memory
+        * Higher BW on-chip
+    * Determine the order of requests that will be serviced in DRAM
+      * Request queues that hold requests
+      * Send requests whenever the request can be sent to the bank
+      * Determine which command (across banks) should be sent to DRAM
+  * Priority of demand vs. prefetch requests
+  * Memory scheduling policies
+    * FCFS
+    * FR-FCFS
+      * Capped FR-FCFS: FR-FCFS with a timeout
+      * Usually this is done in a command level (read/write commands and precharge/activate commands)

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools