Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/04/03 18:20]
rachata
+++ buzzword [2015/04/13 16:08]
kevincha
@@ Line 1194: / Line 1194: @@
       * Based on the characteristics
         * Frequently accessed data that need lower write latency in DRAM
+===== Lecture 27 (4/6 Mon.) =====
+  * Flynn's taxonomy
+  * Parallelism
+    * Reduces power consumption (P ~ CV^2F)
+    * Better cost efficiency and easier to scale
+    * Improves dependability (in case the other core is faulty
+  * Different types of parallelism
+    * Instruction level parallelism
+    * Data level parallelism
+    * Task level parallelism
+  * Task level parallelism
+    * Partition a single, potentially big, task into multiple parallel sub-task
+      * Can be done explicitly (parallel programming by the programmer)
+      * Or implicitly (hardware partitions a single thread speculatively)
+    * Or, run multiple independent tasks (still improves throughput, but the speedup of any single tasks is not better, also simpler to implement)
+  * Loosely coupled multiprocessor
+    * No shared global address space
+      * Message passing to communicate between different sources
+    * Simple to manage memory
+  * Tightly coupled multiprocessor
+    * Shared global address space
+    * Need to ensure consistency of data
+    * Programming issues
+  * Hardware-based multithreading
+    * Coarse grained
+    * Find grained
+    * Simultaneous: Dispatch instruction from multiple threads at the same time
+  * Parallel speedup
+    * Superlinear speedup
+  * Utilization, Redundancy, Efficiency
+  * Amdahl's law
+    * Maximum speedup
+    * Parallel portion is not perfect
+      * Serial bottleneck
+      * Synchronization cost
+      * Load balance
+        * Some threads has more work, requires more time to hit the sync. point
+  * Critical sections
+    * Enforce mutually exclusive access to shared data
+  * Issues in parallel programming
+    * Correctness
+    * Synchronization
+    * Consistency
+===== Lecture 28 (4/8 Wed.) =====
+  * Ordering of instructions
+    * Maintaining memory consistency when there are multiple threads and shared memory
+    * Need to ensure the semantic is not changed
+    * Making sure the shared data is properly locked when used
+      * Support mutual exclusion
+    * Ordering depends on when each processor is executed
+    * Debugging is also difficult (non-deterministic behavior)
+  * Dekker's algorithm
+    * Inconsistency -- the two processors did NOT see the same order of operations to memory
+  * Sequential consistency
+    * Multiple correct global orders
+    * Two issues:
+        * Too conservative/strict
+        * Performance limiting
+  * Weak consistency: global ordering when sync
+    * programmer hints where the synchronizations are
+    * Memory fence
+    * More burden on the programmers
+  * Cache coherence
+    * Can be done in the software level or hardware level
+  * Snoop-based coherence
+    * A simple protocol with two states by broadcasting reads/writes on a bus
+  * Maintaining coherence
+    * Needs to provide 1) write propagation and 2) write serialization
+    * Update vs. Invalidate
+  * Two cache coherence methods
+    * Snoopy bus
+      * Bus based, single point of serialization
+      * More efficient with small number of processors
+      * Processors snoop other caches read/write requests to keep the cache block coherent
+    * Directory
+      * Single point of serialization per block
+      * Directory coordinates the coherency
+      * More scalable
+      * The directory keeps track of where the copies of each block resides
+        * Supplies data on a read
+        * Invalidates the block on a write
+        * Has an exclusive state
+===== Lecture 29 (4/13 Mon.) =====
+  * MSI coherent protocol
+    * The problem: unnecessary broadcasts of invalidations
+  * MESI coherent protocol
+    * Add the exclusive state: this is the only cache copy and it is a clean state to MSI
+    * Multiple invalidation tradeoffs
+    * Problem: memory can be unnecessarily updated
+    * A possible owner state (MOESI)
+  * Tradeoffs between snooping and directory based coherence protocols
+    * Slide 31 has a good summary
+  * Directory: data structures
+    * Bit vectors vs. linked lists
+  * Scalability of directories
+    * Size? Latency? Thousand of nodes? Best of both snooping and directory?

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools