This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2015/02/23 19:19] rachata |
buzzword [2015/02/25 21:05] kevincha [Lecture 17 (2/25 Wed.)] |
||
---|---|---|---|
Line 630: | Line 630: | ||
* BS-ISA | * BS-ISA | ||
* Tradeoffs betwwen trace cache/Hyperblock/Superblock/BS-ISA | * Tradeoffs betwwen trace cache/Hyperblock/Superblock/BS-ISA | ||
- | | ||
| | ||
+ | * IA-64 | ||
+ | * EPIC | ||
+ | * IA-64 instruction bundle | ||
+ | * Multiple instructions in the bundle along with the template bit | ||
+ | * Template bits | ||
+ | * Stop bits | ||
+ | * Non-faulting loads and exception propagation | ||
+ | * Aggressive ST-LD reordering | ||
+ | * Phyiscal memory system | ||
+ | * Ideal pipelines | ||
+ | * Ideal cache | ||
+ | * More capacity | ||
+ | * Fast | ||
+ | * Cheap | ||
+ | * High bandwidth | ||
+ | * DRAM cell | ||
+ | * Cheap | ||
+ | * Sense the purturbation through sense amplifier | ||
+ | * Slow and leaky | ||
+ | * SRAM cell (Cross coupled inverter) | ||
+ | * Expensice | ||
+ | * Fast (easier to sense the value in the cell) | ||
+ | * Memory bank | ||
+ | * Read access sequence | ||
+ | * DRAM: Activate -> Read -> Precharge (if needed) | ||
+ | * What dominate the access laatency for DRAM and SRAM | ||
+ | * Scaling issue | ||
+ | * Hard to scale the scale to be small | ||
+ | * Memory hierarchy | ||
+ | * Prefetching | ||
+ | * Caching | ||
+ | * Spatial and temporal locality | ||
+ | * Cache can exploit these | ||
+ | * Recently used data is likely to be accessed | ||
+ | * Nearby data is likely to be accessed | ||
+ | * Caching in a pipeline design | ||
+ | * Cache management | ||
+ | * Manual | ||
+ | * Data movement is managed manually | ||
+ | * Embedded processor | ||
+ | * GPU scratchpad | ||
+ | * Automatic | ||
+ | * HW manage data movements | ||
+ | * Latency analysis | ||
+ | * Based on the hit and miss status, next level access time (if miss), and the current level access time | ||
+ | * Cache basics | ||
+ | * Set/block (line)/Placement/replacement/direct mapped vs. associative cache/etc. | ||
+ | * Cache access | ||
+ | * How to access tag and data (in parallel vs serially) | ||
+ | * How do tag and index get used? | ||
+ | * Modern processors perform serial access for higher level cache (L3 for example) to save power | ||
+ | * Cost and benefit of having more associativity | ||
+ | * Given the associativity, which block should be replace if it is full | ||
+ | * Replacement poligy | ||
+ | * Random | ||
+ | * Least recently used (LRU) | ||
+ | * Least frequently used | ||
+ | * Least costly to refetch | ||
+ | * etc. | ||
+ | * How to implement LRU | ||
+ | * How to keep track of access ordering | ||
+ | * Complexity increases rapidly | ||
+ | * Approximate LRU | ||
+ | * Victim and next Victim policy |