This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2015/02/25 21:01] kevincha |
buzzword [2015/02/27 19:19] kevincha |
||
---|---|---|---|
Line 631: | Line 631: | ||
* Tradeoffs betwwen trace cache/Hyperblock/Superblock/BS-ISA | * Tradeoffs betwwen trace cache/Hyperblock/Superblock/BS-ISA | ||
| | ||
- | |||
===== Lecture 17 (2/25 Wed.) ===== | ===== Lecture 17 (2/25 Wed.) ===== | ||
- | + | * IA-64 | |
- | * IA-64 | + | * EPIC |
- | * EPIC | + | * IA-64 instruction bundle |
- | * IA-64 instruction bundle | + | * Multiple instructions in the bundle along with the template bit |
- | * Multiple instructions in the bundle along with the template bit | + | * Template bits |
- | * Template bits | + | * Stop bits |
- | * Stop bits | + | * Non-faulting loads and exception propagation |
- | * Non-faulting loads and exception propagation | + | * Aggressive ST-LD reordering |
- | * Aggressive ST-LD reordering | + | * Phyiscal memory system |
- | * Phyiscal memory system | + | * Ideal pipelines |
- | * Ideal pipelines | + | * Ideal cache |
- | * Ideal cache | + | * More capacity |
- | * More capacity | + | * Fast |
- | * Fast | + | * Cheap |
- | * Cheap | + | * High bandwidth |
- | * High bandwidth | + | * DRAM cell |
- | * DRAM cell | + | * Cheap |
- | * Cheap | + | * Sense the purturbation through sense amplifier |
- | * Sense the purturbation through sense amplifier | + | * Slow and leaky |
- | * Slow and leaky | + | * SRAM cell (Cross coupled inverter) |
- | * SRAM cell (Cross coupled inverter) | + | * Expensice |
- | * Expensice | + | * Fast (easier to sense the value in the cell) |
- | * Fast (easier to sense the value in the cell) | + | * Memory bank |
- | * Memory bank | + | * Read access sequence |
- | * Read access sequence | + | |
* DRAM: Activate -> Read -> Precharge (if needed) | * DRAM: Activate -> Read -> Precharge (if needed) | ||
- | * What dominate the access laatency for DRAM and SRAM | + | * What dominate the access laatency for DRAM and SRAM |
- | * Scaling issue | + | * Scaling issue |
- | * Hard to scale the scale to be small | + | * Hard to scale the scale to be small |
- | * Memory hierarchy | + | * Memory hierarchy |
- | * Prefetching | + | * Prefetching |
- | * Caching | + | * Caching |
- | * Spatial and temporal locality | + | * Spatial and temporal locality |
- | * Cache can exploit these | + | * Cache can exploit these |
- | * Recently used data is likely to be accessed | + | * Recently used data is likely to be accessed |
- | * Nearby data is likely to be accessed | + | * Nearby data is likely to be accessed |
- | * Caching in a pipeline design | + | * Caching in a pipeline design |
- | * Cache management | + | * Cache management |
- | * Manual | + | * Manual |
- | * Data movement is managed manually | + | * Data movement is managed manually |
- | * Embedded processor | + | * Embedded processor |
- | * GPU scratchpad | + | * GPU scratchpad |
- | * Automatic | + | * Automatic |
- | * HW manage data movements | + | * HW manage data movements |
- | * Latency analysis | + | * Latency analysis |
- | * Based on the hit and miss status, next level access time (if miss), and the current level access time | + | * Based on the hit and miss status, next level access time (if miss), and the current level access time |
- | * Cache basics | + | * Cache basics |
- | * Set/block (line)/Placement/replacement/direct mapped vs. associative cache/etc. | + | * Set/block (line)/Placement/replacement/direct mapped vs. associative cache/etc. |
- | * Cache access | + | * Cache access |
- | * How to access tag and data (in parallel vs serially) | + | * How to access tag and data (in parallel vs serially) |
- | * How do tag and index get used? | + | * How do tag and index get used? |
- | * Modern processors perform serial access for higher level cache (L3 for example) to save power | + | * Modern processors perform serial access for higher level cache (L3 for example) to save power |
- | * Cost and benefit of having more associativity | + | * Cost and benefit of having more associativity |
- | * Given the associativity, which block should be replace if it is full | + | * Given the associativity, which block should be replace if it is full |
- | * Replacement poligy | + | * Replacement poligy |
- | * Random | + | * Random |
- | * Least recently used (LRU) | + | * Least recently used (LRU) |
- | * Least frequently used | + | * Least frequently used |
- | * Least costly to refetch | + | * Least costly to refetch |
- | * etc. | + | * etc. |
- | * How to implement LRU | + | * How to implement LRU |
- | * How to keep track of access ordering | + | * How to keep track of access ordering |
- | * Complexity increases rapidly | + | * Complexity increases rapidly |
- | * Approximate LRU | + | * Approximate LRU |
- | * Victim and next Victim policy | + | * Victim and next Victim policy |
+ | |||
+ | ===== Lecture 18 (2/27 Fri.) ===== | ||
+ | * Tag store and data store | ||
+ | * Cache hit rate | ||
+ | * Average memory access time (AMAT) | ||
+ | * AMAT vs. Stall time | ||
+ | * Cache basics | ||
+ | * Direct mapped vs. associative cache | ||
+ | * Set/block (line)/Placement/replacement | ||
+ | * How do tag and index get used? | ||
+ | * Full associativity | ||
+ | * Set associative cache | ||
+ | * insertion, promotion, eviction (replacement) | ||
+ | * Various replacement policies | ||
+ | * How to implement LRU | ||
+ | * How to keep track of access ordering | ||
+ | * Complexity increases rapidly | ||
+ | * Approximate LRU | ||
+ | * Victim and next Victim policy | ||
+ | * Set thrashing | ||
+ | * Working set is bigger than the associativity | ||
+ | * Belady's OPT | ||
+ | * Is this optimal? | ||
+ | * Complexity? | ||
+ | * DRAM as a cache for disk | ||
+ | * Handling writes | ||
+ | * Write through | ||
+ | * Need a modified bit to make sure accesses to data got the updated data | ||
+ | * Write back | ||
+ | * Simpler, no consistency issues | ||
+ | * Sectored cache | ||
+ | * Use subblock | ||
+ | * lower bandwidth | ||
+ | * more complex | ||
+ | * Instruction vs data cache | ||
+ | * Where to place instructions | ||
+ | * Unified vs. separated | ||
+ | * In the first level cache | ||
+ | * Cache access | ||
+ | * First level access | ||
+ | * Second level access | ||
+ | * When to start the second level access | ||
+ | * Cache performance | ||
+ | * capacity | ||
+ | * block size | ||
+ | * associativity | ||
+ | * Classification of cache misses |