This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
buzzword [2014/02/19 19:20] rachata |
buzzword [2014/02/24 19:17] rachata |
||
---|---|---|---|
Line 526: | Line 526: | ||
* Register aliasing table | * Register aliasing table | ||
+ | ===== Lecture 15 (2/21 Fri.) ===== | ||
+ | |||
+ | * OoO --> Restricted Dataflow | ||
+ | * Extracting parallelism | ||
+ | * What are the bottlenecks? | ||
+ | * Issue width | ||
+ | * Dispatch width | ||
+ | * Parallelism in the program | ||
+ | * More example on slide #10 | ||
+ | * What does it mean to be restricted data flow | ||
+ | * Still visible as a Von Neumann model | ||
+ | * Where does the efficiency come from? | ||
+ | * Size of the scheduling windors/reorder buffer. Tradeoffs? What make sense? | ||
+ | * Load/store handling | ||
+ | * Would like to schedule them out of order, but make them visible in-order | ||
+ | * When do you schedule the load/store instructions? | ||
+ | * Can we predict if load/store are dependent? | ||
+ | * This is one of the most complex structure of the load/store handling | ||
+ | * What information can be used to predict these load/store optimization? | ||
+ | * Note: IPC = 1/CPI | ||
+ | * Centralized vs. distributed? What are the tradeoffs? | ||
+ | * How to handle when there is a misprediction/recovery | ||
+ | * Token dataflow arch. | ||
+ | * What are tokens? | ||
+ | * How to match tokens | ||
+ | * Tagged token dataflow arch. | ||
+ | * What are the tradeoffs? | ||
+ | * Difficulties? | ||
+ | |||
+ | ===== Lecture 16 (2/24 Mon.) ===== | ||
+ | |||
+ | * SISD/SIMD/MISD/MIMD | ||
+ | * Array processor | ||
+ | * Vector processor | ||
+ | * Data parallelism | ||
+ | * Where does the concurrency arise? | ||
+ | * Differences between array processor vs. vector processor | ||
+ | * VLIW | ||
+ | * Compactness of an array processor | ||
+ | * Vector operates on a vector of data (rather than a single datum (scalar)) | ||
+ | * Vector length (also applies to array processor) | ||
+ | * No dependency within a vector --> can have a deep pipeline | ||
+ | * Highly parallel (both instruction level (ILP) and memory level (MLP)) | ||
+ | * But the program needs to be very parallel | ||
+ | * Memory can be the bottleneck (due to very high MLP) | ||
+ | * What does the functional units look like? Deep pipelin and simpler control. | ||
+ | * CRAY-I is one of the examples of vector processor | ||
+ | * Memory access pattern in a vector processor | ||
+ | * How do the memory accesses benefit the memory bandwidth? | ||
+ | * Please refer to slides 73-74 in http://www.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447-spring13-lecture25-mainmemory-afterlecture.pdf for a breif explanation of memory level parallelism | ||
+ | * Stride length vs. the number of banks | ||
+ | * stride length should be relatively prime to the number of banks | ||
+ | * Tradeoffs between row major and column major --> How can the vector processor deals with the two | ||
+ | * How to calculate the efficiency and performance of vector processors | ||
+ | * What if there are multiple memory ports? | ||
+ | * Gather/Scatter allows vector processor to be a lot more programmable (i.e. gather data for parallelism) | ||
+ | * Helps handling sparse metrices | ||
+ | * Conditional operation | ||
+ | * Structure of vector units | ||
+ | * How to automatically parallelize code through the compiler? | ||
+ | * This is a hard problem. Compiler does not know the memory address. | ||
+ | * What do we need to ensure for both vector and array processor? | ||
+ | * Sequential bottleneck | ||
+ | * Amdahl's law | ||
+ | * Intel MMX --> An example of Intel's approach to SIMD | ||
+ | * No VLEN, use OpCode to define the length | ||
+ | * Stride is one in MMX | ||
+ | * Intel SSE --> Modern version of MMX | ||