Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/03/02 19:16]
kevincha [Lecture 19 (03/02 Mon.)]
+++ buzzword [2015/03/25 18:19]
rachata
@@ Line 45: / Line 45: @@
   * Memory wall (a part of scaling issue)
   * Scaling issue
-    * Transister are getting smaller
+    * Transistor are getting smaller
   * Key components of a computer
   * Design points
@@ Line 54: / Line 54: @@
   * Reliability problems that cause errors
   * Analogies from Kuhn's "The Structure of Scientific Revolutions" (Recommended book)
-    * Pre paradigm science
+    * Pre-paradigm science
     * Normal science
-    * Revolutionalry science
+    * Revolutionary science
   * Components of a computer
     * Computation
@@ Line 76: / Line 76: @@
     * Operands
     * Live-outs/Live-ins
-      * DIfferent types of data flow nodes (conditional/relational/barrier)
+      * Different types of data flow nodes (conditional/relational/barrier)
     * How to do transactional transaction in dataflow?
       * Example: bank transactions
@@ Line 121: / Line 121: @@
    * Tradeoffs between 0,1,2,3 address machines
    * Postfix notation
-   * Instructions/Opcode/Operade specifiers (i.e. addressing modes)
+   * Instructions/Opcode/Operand specifiers (i.e. addressing modes)
    * Simply vs. complex data type (and their tradeoffs)
    * Semantic gap and level
@@ Line 236: / Line 236: @@
   * Variable latency memory
   * Handling interrupts
-  * Difference betwen interrupts and exceptions
+  * Difference between interrupts and exceptions
   * Emulator (i.e. uCode allots minimal datapath to emulate the ISA)
   * Updating machine behavior
@@ Line 257: / Line 257: @@
     * Idle resources
   * Throughput of a pipelined design
-    * What dictacts the throughput of a pipelined design?
+    * What dictates the throughput of a pipelined design?
   * Latency of the pipelined design
   * Dependency
@@ Line 336: / Line 336: @@
     * Inst. from different thread can fill-in the bubbles
     * Cost?
-  * Simulteneuos multithreading
+  * Simultaneuos multithreading
   * Branch prediction
     * Guess what to fetch next.
@@ Line 345: / Line 345: @@
       * Given the program/number of instructions, percent of branches, branch prediction accuracy and penalty cost, how to compute a cost coming from branch mispredictions.
         * How many extra instructions are being fetched?
-        * What is the performance degredation?
+        * What is the performance degradation?
     * How to reduce the miss penalty?
     * Predicting the next address (non PC+4 address)
@@ Line 352: / Line 352: @@
     * Global branch history - for directions
     * Can use compiler to profile and get more info
-      * Input set dictacts the accuracy
+      * Input set dictates the accuracy
       * Add time to compilation
     * Heuristics that are common and doesn't require profiling.
@@ Line 358: / Line 358: @@
       * Does not require profiling
     * Static branch prediction
-      * Pregrammer provides pragmas, hinting the likelihood of taken/not taken branch
+      * Programmer provides pragmas, hinting the likelihood of taken/not taken branch
       * For example, x86 has the hint bit
     * Dynamic branch prediction
@@ Line 369: / Line 369: @@
     * Why are they very important?
       * Differences between 99% accuracy and 98% accuracy
-      * Cost of a misprediction when the pipeline is veryd eep
+      * Cost of a misprediction when the pipeline is very deep
   * Global branch correlation
     * Some branches are correlated
@@ Line 405: / Line 405: @@
       * Use branch prediction on an easy to predict code
       * Use predicated execution on a hard to predict code
-      * Compiler can be more aggressive in optimimzing the code
+      * Compiler can be more aggressive in optmizing the code
     * What are the tradeoffs (slide# 47)
   * Multi-path execution
@@ Line 411: / Line 411: @@
     * Can lead to wasted work
     * VLIW
-    * SuperScalar
+    * Superscalar
@@ Line 434: / Line 434: @@
   * Reorder buffer
     * Reorder results before they are visible to the arch. state
-      * Need to presearve the sequential sematic and data
+      * Need to preserve the sequential semantic and data
-    * What are the informatinos in the ROB entry
+    * What are the information in the ROB entry
     * Where to get the value from (forwarding path? reorder buffer?)
       * Extra logic to check where the youngest instructions/value is
-      * Content addressible search (CAM)
+      * Content addressable search (CAM)
         * A lot of comparators
     * Different ways to simplify the reorder buffer
@@ Line 450: / Line 450: @@
       * An updated value (Speculative), called future file
       * A backup value (to restore the state quickly
-    * Double the cost of the regfile, but reduce the area as you don't have to use a content addressible memory (compared to ROB alone)
+    * Double the cost of the regfile, but reduce the area as you don't have to use a content addressable memory (compared to ROB alone)
   * Branch misprediction resembles Exception
     * The difference is that branch misprediction is not visible to the software
@@ Line 486: / Line 486: @@
       * Slides 28 --> register renaming
       * Slides 30-35 --> Exercise (also on the board)
-        * This will be usefull for the midterm
+        * This will be useful for the midterm
     * Register aliasing table
     * Broadcasting tags
@@ Line 503: / Line 503: @@
       * Still visible as a Von Neumann model
     * Where does the efficiency come from?
-    * Size of the scheduling windors/reorder buffer. Tradeoffs? What make sense?
+    * Size of the scheduling windows/reorder buffer. Tradeoffs? What make sense?
   * Load/store handling
     * Would like to schedule them out of order, but make them visible in-order
@@ Line 538: / Line 538: @@
     * But the program needs to be very parallel
     * Memory can be the bottleneck (due to very high MLP)
-    * What does the functional units look like? Deep pipelin and simpler control.
+    * What does the functional units look like? Deep pipeline and simpler control.
     * CRAY-I is one of the examples of vector processor
     * Memory access pattern in a vector processor
@@ Line 549: / Line 549: @@
     * What if there are multiple memory ports?
     * Gather/Scatter allows vector processor to be a lot more programmable (i.e. gather data for parallelism)
-      * Helps handling sparse metrices
+      * Helps handling sparse matrices
     * Conditional operation
     * Structure of vector units
@@ Line 579: / Line 579: @@
     * Lower SIMD efficiency
     * What if you have layers of branches?
-  * Dynamic wrap formation
+  * Dynamic warp formation
     * Combining threads from different warps to increase SIMD utilization
     * This can cause memory divergence
@@ Line 593: / Line 593: @@
     * How to street the instruction (determine dependency/stalling)?
     * Instruction scheduling techniques (static vs. dynamic)
-  * Systoric arrays
+  * Systolic arrays
     * Processing elements transform data in chains
     * Develop for image processing (for example, convolution)
@@ Line 601: / Line 601: @@
 ===== Lecture 16 (2/23 Mon.) =====
-  * Systoric arrays
+  * Systolic arrays
     * Processing elements transform data in chains
     * Can be arrays of multi-dimensional processing elements
@@ Line 622: / Line 622: @@
     * Side entrance
     * Fixed up code
-    * How scheudling is done
+    * How scheduling is done
   * Instruction scheduling
     * Prioritization heuristics
@@ Line 629: / Line 629: @@
   * Hyperblock
   * BS-ISA
-  * Tradeoffs betwwen trace cache/Hyperblock/Superblock/BS-ISA
+  * Tradeoffs between trace cache/Hyperblock/Superblock/BS-ISA
 ===== Lecture 17 (2/25 Wed.) =====
@@ Line 640: / Line 640: @@
     * Non-faulting loads and exception propagation
   * Aggressive ST-LD reordering
-  * Phyiscal memory system
+  * Physical memory system
   * Ideal pipelines
   * Ideal cache
@@ Line 649: / Line 649: @@
   * DRAM cell
     * Cheap
-    * Sense the purturbation through sense amplifier
+    * Sense the perturbation through sense amplifier
     * Slow and leaky
   * SRAM cell (Cross coupled inverter)
@@ Line 657: / Line 657: @@
     * Read access sequence
     * DRAM: Activate -> Read -> Precharge (if needed)
-    * What dominate the access laatency for DRAM and SRAM
+    * What dominate the access latency for DRAM and SRAM
   * Scaling issue
     * Hard to scale the scale to be small
@@ Line 685: / Line 685: @@
   * Cost and benefit of having more associativity
     * Given the associativity, which block should be replace if it is full
-    * Replacement poligy
+    * Replacement policy
       * Random
       * Least recently used (LRU)
@@ Line 749: / Line 749: @@
     * Cached misses cache block
     * Prevent ping-ponging
-  * Pseudo associtivity
+  * Pseudo associativity
     * Simpler way to implement associative cache
   * Skewed assoc. cache
@@ Line 765: / Line 765: @@
   * Memory banks
   * Shared caches in multi-core processors
+===== Lecture 20 (03/04 Wed.) =====
+  * Virtual vs. physical memory
+  * System's management on memory
+    * Benefits
+  * Problem: physical memory has limited size
+  * Mechanisms: indirection, virtual addresses, and translation
+  * Demand paging
+  * Physical memory as a cache
+  * Tasks of system SW for VM
+  * Serving a page fault
+  * Address translation
+  * Page table
+    * PTE (page table entry)
+  * Page replacement algorithm
+    * CLOCK algo.
+    * Inverted page table
+  * Page size trade-offs
+  * Protection
+  * Multi-level page tables
+  * x86 implementation of page table
+  * TLB
+    * Handling misses
+  * When to do address translation?
+  * Homonym and Synonyms
+    * Homonym: Same VA but maps to different PA with multiple processes
+    * Synonyms: Multiple VAs map to the same PA
+      * Shared libraries, shared data, copy-on-write
+  * Virtually indexed vs. physically indexed
+  * Virtually tagged vs. physically tagged
+  * Virtually indexed physically tagged
+  * Can these create problems when we have the cache
+  * How to eliminate these problems?
+  * Page coloring
+  * Interaction between cache and TLB
+===== Lecture 21 (03/23 Mon.) =====
+  * DRAM scaling problem
+  * Demands/trends affecting the main memory
+    * More capacity
+    * Low energy
+    * More bandwidth
+    * QoS
+  * ECC in DRAM
+  * Multi-porting
+    * Virtual multi-porting
+      * Time-share the port, not too scalable but cheap
+    * True multiporting
+  * Multiple cache copies
+  * Alignment
+  * Banking
+    * Can have bank conflict
+    * Extra interconnects across banks
+    * Address mapping can mitigate bank conflict
+    * Common in main memory (note that regFile in GPU is also banked, but mainly for the pupose of reducing complexity)
+  * Bank mapping
+    * How to avoid bank conflicts?
+  * Channel mapping
+    * Address mapping to minimize bank conflict
+    * Page coloring
+      * Virtual to physical mapping that can help reducing conflicts
+  * Accessing DRAM
+    * Row bits
+    * Column bits
+    * Addressibility
+    * DRAM has its own clock
+    * Sense amplifier
+    * Bit lines
+    * Word lines
+  * DRAM (2T) vs. SRAM (6T)
+    * Cost
+    * Latency
+  * Interleaving in DRAM
+    * Effects from address mapping on memory interleaving
+    * Effects from memory access patterns from the program on interleaving
+  * DRAM Bank
+    * To minimize the cost of interleaving (Shared the data bus and the command bus)
+  * DRAM Rank
+    * Minimize the cost of the chip (a bundle of chips operated together)
+  * DRAM Channel
+    * An interface to DRAM, each with its own ranks/banks
+  * DRAM Chip
+  * DIMM
+    * More DIMM adds the interconnect complexity
+  * List of commands to read/write data into DRAM
+    * Activate -> read/write -> precharge
+    * Activate moves data into the row buffer
+    * Precharge prepare the bank for the next access
+  * Row buffer hit
+  * Row buffer conflict
+  * Scheduling memory requests to lower row conflicts
+  * Burst mode of DRAM
+    * Prefetch 32-bits from an 8-bit interface if DRAM needs to read 32 bits
+  * Address mapping
+    * Row interleaved
+    * Cache block interleaved
+  * Memory controller
+    * Sending DRAM commands
+    * Periodically send commands to refresh DRAM cells
+    * Ensure correctness and data integrity
+    * Where to place the memory controller
+      * On CPU chip vs. at the main memory
+        * Higher BW on-chip
+    * Determine the order of requests that will be serviced in DRAM
+      * Request queues that hold requests
+      * Send requests whenever the request can be sent to the bank
+      * Determine which command (across banks) should be sent to DRAM
+===== Lecture 22 (03/25 Wed.) =====
+  * Flash controller
+  * Flash memory
+  * Garbage collection in flash
+  * Overhead in flash memory
+    * Erase (off the critical path, but takes a long time)
+  * Different types of DRAM
+  * DRAM design choices
+    * Cost/density/latency/BW/Yield
+  * Sense Amplifier
+    * How do they work
+  * Dual data rate
+  * Subarray
+  * Rowclone
+    * Moving bulk of data from one row to others
+    * Lower latency and BW when performing copies/zeroes out the data
+  * TL-DRAM
+    * Far segment
+    * Near segment
+    * What causes the long latency
+    * Benefit of TL-DRAM
+      * TL-DRAM vs. DRAM cache (adding a small cache in DRAM)
+  * List of commands to read/write data into DRAM
+    * Activate -> read/write -> precharge
+    * Activate moves data into the row buffer
+    * Precharge prepare the bank for the next access
+  * Row buffer hit
+  * Row buffer conflict
+  * Scheduling memory requests to lower row conflicts
+  * Burst mode of DRAM
+    * Prefetch 32-bits from an 8-bit interface if DRAM needs to read 32 bits
+  * Address mapping
+    * Row interleaved
+    * Cache block interleaved
+  * Memory controller
+    * Sending DRAM commands
+    * Periodically send commands to refresh DRAM cells
+    * Ensure correctness and data integrity
+    * Where to place the memory controller
+      * On CPU chip vs. at the main memory
+        * Higher BW on-chip
+    * Determine the order of requests that will be serviced in DRAM
+      * Request queues that hold requests
+      * Send requests whenever the request can be sent to the bank
+      * Determine which command (across banks) should be sent to DRAM
+  * Priority of demand vs. prefetch requests
+  * Memory scheduling policies
+    * FCFS
+    * FR-FCFS
+      * Try to maximize row buffer hit rate
+      * Capped FR-FCFS: FR-FCFS with a timeout
+      * Usually this is done in a command level (read/write commands and precharge/activate commands)
+    * PAR-BS
+      * Key benefits
+      * stall time
+      * shortest job first
+    * STFM
+    * ATLAS
+    * TCM
+      * Key benefits
+      * Configurability
+      * Fairness + performance at the same time
+      * Robuestness isuees
+  * Open row policy
+  * Closed row policy
+  * QoS
+    * QoS issues in memory scheduling
+    * Fairness
+    * Performance guarantee

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools