Differences

This shows you the differences between two versions of the page.

--- buzzword [2014/04/14 18:16]
rachata
+++ buzzword [2014/05/02 18:14]
rachata
@@ Line 1154: / Line 1154: @@
     * Synchronization
     * Consistency
+===== Lecture 29 (4/16 Wed.) =====
+  * Ordering of instructions
+    * Maintaining memory consistency when there are multiple threads and shared memory
+    * Need to ensure the semantic is not changed
+    * Making sire the shared data is properly locked when used
+      * Support mutual exclusion
+    * Ordering depends on when each processor is executed
+    * Debugging is also difficult (non-deterministic behavior)
+  * Weak consistency: global ordering when sync
+    * programmer hints where the synchronizations are
+  * Total store order model: global ordering only with store
+  * Cache coherence
+    * Can be done in the software level or hardware level
+  * Coherence protocol
+    * Need to ensure that all the processors see and update the correct state of the cache block
+    * Need to make sure that writes get propagated and serialized
+    * Simple protocol are not scalable (one point of synchrnization)
+  * Update vs. invalidate
+    * For invalidate, only the core that needs to read retains the correct copy
+      * Can lead to ping-ponging (tons of read/writes from several processors)
+    * For updates, bus becomes the bottleneck
+  * Snoopy bus
+    * Bus based, single point of serialization
+    * More efficient with small number of processors
+    * All cache snoop other caches read/write requests to keep the cache block coherent
+  * Directory based
+    * Single point of serialization per block
+    * Directory coordinate the coherency
+    * More scalable
+    * The directory keeps track of where the copies of each block resides
+      * Supply data on a read
+      * Invalide the block on a write
+      * Has an exclusive state
+  * MSI coherent protocol
+    * Slide number 56-57
+    * Consume bus bandwidth (need an "exclusive" state
+  * MESI coherent protocal
+    * Add the exclusive state: this is the only cache copy and it is clean state to MSI
+  * Tradeoffs between snooping and directory based
+    * Slide 71 has a good summary on this
+  * MOESI
+    * Improvement over MESI protocol
+===== Lecture 29 (4/18 Wed.) =====
+  * Interference
+  * Complexity of the memory scheduler
+    * Ranking/prioritization has cost
+    * Complex scheduler has higher latency
+  * Performance metric for multicore/multithead applications
+    * Speedup
+    * Slowdown
+    * Harmonic vs wrighted
+  * Fairness mertic
+    * Maximum slowdown
+      * Why does it make sense
+      * Any scenario that it does not make sense?
+  * Predictable performance
+    * Why is it important?
+      * In server environment, different jobs are on the same server
+      * In a mobile environment, there are multiple sources that can slowdown other sources
+    * How to relate slowdown with request service rate
+    * MISE: soft slowdown guarantee
+  * BDI
+    * Memory wall
+      * What is the concern regarding the memory wall
+    * Size of the cache on the die (CPU die)
+    * One possible solution: cache compression
+      * What is the problems of existing cache compression mechanism
+        * Some are too complex
+        * Decompression is in the critical path
+          * Need to decompress when reading the data -> decompression should not be in the critical path
+          * Important factor to the performance
+    * Software compression is not good enough to compress everything
+    * Zero value compression
+      * Simple
+      * Good compression ratio
+      * What is data does not have many zeroes
+    * Frequent value compression
+      * Some data appear fequently
+      * Simple and good compression ratio
+      * have to profile
+      * decompression is complex
+    * Frequent pattern compression
+      * Still to complex in terms of decompression
+    * Based delta compression
+      * Easy to decompress but retain the benefit of compression
+===== Lecture 31 (4/28 Mon.) =====
+  * Directory based cache coherent
+    * Each directory has to handle validate/invalidation
+    * Extra cost of syncronization
+    * Need to ensure race conditions are resolved
+  * Interconnection
+    * Topology
+      * Bus
+      * Mesh
+        * Torus
+      * Tree
+      * Butterfly
+      * Ring
+        * Bi-directional ring
+          * More scalable
+        * Hierarchical ring
+          * Even more scalable
+          * More complex
+      * Crossbar
+      * etc.
+    * Circuit switching
+    * Multistage network
+      * Butterfly
+      * Delta network
+    * Handling contention
+      * Buffering vs. dropping/deflection (no buffering)
+    * Routing algorithm
+      * Handling deadlock
+      * X-Y routing
+        * Turn model (to avoid deadlocks)
+      * Add more buffering for an escape path
+      * Oblivious routing
+        * Can take different path
+          * DOR between each intermediate location
+        * Balance network load
+      * Adaptive routing
+        * Use the state of the network to determine the route
+          * Aware of local and/or global congestions
+        * Non minimal adaptive routing can have livelocks
+===== Lecture 32 (4/30 Wed.) =====
+  * Serialized code section
+    * Degrade performance
+    * Waste energy
+  * Heterogeneous cores
+    * Can execute serialized portion on a powerful large core
+  * Tradeoff between multiple small cores, multiple large cores or heterogenerous cores
+  * Critical section
+    * bottleneck in several multithreaded workloads
+    * Assymmetry can help
+    * Accelerated critical section
+      * Use a large core to run serialized portion of the code
+      * How to correctly support ACS
+      * False serialization
+      * Handling private/shared data
+    * BIS
+      * Ideltify the bottleneck
+        * Serial bottleneck
+        * Barrier
+        * Critical section
+        * Pipeline stages
+      * Application might wait on different types of bottlenecks
+      * Allow bottleneckcall and bottleneckreturn
+      * Acceleration can be done in multiple ways
+        * ship to a big core
+        * increase the frequency
+        * Priorize the thread in share resources (memory scheduler always schedule reqeusts from the thread first, etc.)
+      * Bottleneck table keeps track of different thread's bottleneck and determine the criticality
+===== Lecture 33 (5/2 Fri.) =====
+  * DRAM scaling problem
+  * Possible solutions to the scaling problem
+    * Less leakage DRAM
+    * Heterogeneous DRAM (TL-DRAM, etc.)
+    * Add more functionality to DRAM
+    * Denser design (3D stack)
+    * Different technology
+      * NVM
+  * Non volatile memory
+    * Resistive memory
+      * PCM
+        * Inject current to change the phase
+        * Scales better than DRAM
+          * Multiple bits per cell
+            * Wider resistence range
+        * No refresh is needed
+        * Downside: Latency and write endurance
+      * STT-MRAM
+        * Inject current to change the polarity
+      * Memristor
+        * Inject current to change the structure
+    * Persistency - data stay there even without power
+      * Unified memory and storage management (persistent data structure) - Single level store
+        * Improve energy and performance
+        * Simplify programming model

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools