#### 18-447 ## Computer Architecture Lecture 31: Predictable Performance Lavanya Subramanian Carnegie Mellon University Spring 2015, 4/15/2015 #### Shared Resource Interference ## High and Unpredictable Application Slowdowns 2.1A.nHiggipphipatioantisopherlicoundnawers depectrods on which exploration interference with ### Need for Predictable Performance - There is a need for predictable performance - When multiple applications share resources - Especially if some applications require performance guarantees ## Our Goal: Predictable performance in the presence of shared resources - Example 2. III mobile systems - Interactive applications run with non-interactive applications - Need to guarantee performance for interactive applications # Tackling Different Parts of the Shared Memory Hierarchy # Predictability in the Presence of Memory Bandwidth Interference ## Predictability in the Presence of Memory Bandwidth Interference (HPCA 2013) #### 1. Estimate Slowdown 2. Control Slowdown ## Predictability in the Presence of Memory Bandwidth Interference #### 1. Estimate Slowdown - Key Observations - -Implementation - -MISE Model: Putting it All Together - Evaluating the Model #### 2. Control Slowdown Providing Soft Slowdown Guarantees ### Slowdown: Definition $$Slowdown = \frac{Performance Alone}{Performance Shared}$$ For a memory bound application, Performance ✓ Memory request service rate Normalized Request Service Rate Request Service Rate Alone (RSR Alone) of an application can be estimated by giving the application highest priority at the memory controller Highest priority → Little interference (almost as if the application were run alone) Memory Interference-induced Slowdown Estimation (MISE) model for memory bound applications $$Slowdown = \frac{Request Service Rate Alone (RSRAlone)}{Request Service Rate Shared (RSRShared)}$$ Memory phase slowdown dominates overall slowdown Memory Interference-induced Slowdown Estimation (MISE) model for non-memory bound applications Slowdown = $$(1-\alpha) + \alpha \frac{RSR_{Alone}}{RSR_{Shared}}$$ ## Predictability in the Presence of Memory Bandwidth Interference #### 1. Estimate Slowdown - Key Observations - –Implementation - -MISE Model: Putting it All Together - Evaluating the Model #### 2. Control Slowdown Providing Soft Slowdown Guarantees ### Interval Based Operation ## Measuring RSR<sub>Shared</sub> and α - Request Service Rate <sub>Shared</sub> (RSR<sub>Shared</sub>) - Per-core counter to track number of requests serviced - At the end of each interval, measure $$RSR_{Shared} = \frac{Number of Requests Served}{Interval Length}$$ - Memory Phase Fraction ( $\alpha$ ) - Count number of stall cycles at the core - Compute fraction of cycles stalled for memory #### Estimating Request Service Rate Alone (RSR Alone) Divide each interval into shorter epochs Goal: Estimate RSR Alone • At the beginning of each epoch How: Periodically give each application as the highest priority highest priority in accessing memory At the end of an interval, for each application, estimate RSR<sub>Alone</sub> = Number of Requests During High Priority Epochs Number of Cycles Application Given High Priority ## Inaccuracy in Estimating RSR<sub>Alone</sub> When an application has highest priority # Accounting for Interference in RSR<sub>Alone</sub> Estimation Solution: Determine and remove interference cycles from ARSR calculation - A cycle is an interference cycle if - a request from the highest priority application is waiting in the request buffer and - another application's request was issued previously # Predictability in the Presence of Memory Bandwidth Interference #### 1. Estimate Slowdown - Key Observations - -Implementation - -MISE Model: Putting it All Together - Evaluating the Model #### 2. Control Slowdown Providing Soft Slowdown Guarantees ### MISE Operation: Putting it All Together ## Predictability in the Presence of Memory Bandwidth Interference #### 1. Estimate Slowdown - Key Observations - -Implementation - -MISE Model: Putting it All Together - Evaluating the Model #### 2. Control Slowdown Providing Soft Slowdown Guarantees ## Previous Work on Slowdown Estimation - Previous work on slowdown estimation - STFM (Stall Time Fair Memory) Scheduling [Mutlu et al., MICRO '07] - FST (Fairness via Source Throttling) [Ebrahimi et al., ASPLOS '10] - Per-thread Cycle Accounting [Du Bois et al., HiPEAC '13] Count number of cycles application receives interference ### Two Major Advantages of MISE Over STFM #### Advantage 1: - STFM estimates alone performance while an application is receiving interference → Difficult - MISE estimates alone performance while giving an application the highest priority → Easier #### Advantage 2: - STFM does not take into account compute phase for non-memory-bound applications - MISE accounts for compute phase → Better accuracy ### Methodology - Configuration of our simulated system - 4 cores - 1 channel, 8 banks/channel - DDR3 1066 DRAM - 512 KB private cache/core - Workloads - SPEC CPU2006 - 300 multi programmed workloads ### Quantitative Comparison ## Comparison to STFM # Predictability in the Presence of Memory Bandwidth Interference #### 1. Estimate Slowdown - Key Observations - -Implementation - -MISE Model: Putting it All Together - Evaluating the Model #### 2. Control Slowdown Providing Soft Slowdown Guarantees #### Possible Use Cases Bounding application slowdowns [HPCA '14] VM migration and admission control schemes [VEE '15] Fair billing schemes in a commodity cloud ## Predictability in the Presence of Memory Bandwidth Interference #### 1. Estimate Slowdown - Key Observations - -Implementation - -MISE Model: Putting it All Together - Evaluating the Model #### 2. Control Slowdown Providing Soft Slowdown Guarantees ### MISE-QoS: Providing "Soft" Slowdown Guarantees - Goal - 1. Ensure QoS-critical applications meet a prescribed slowdown bound - 2. Maximize system performance for other applications - Basic Idea - Allocate just enough bandwidth to QoS-critical application - Assign remaining bandwidth to other applications ### Methodology - Each application (25 applications in total) considered the QoS-critical application - Run with 12 sets of co-runners of different memory intensities - Total of 300 multi programmed workloads - Each workload run with 10 slowdown bound values - Baseline memory scheduling mechanism - Always prioritize QoS-critical application [Iyer et al., SIGMETRICS 2007] Other applications' requests scheduled in FR-FCFS order [Zuravleff and Robinson, US Patent 1997, Rixner+, ISCA 2000] #### A Look at One Workload #### MISE is effective in - meeting the slowdown bound for the QoS-critical application - improving performance of non-QoS-critical applications ### Effectiveness of MISE in Enforcing QoS #### Across 3000 data points | | Predicted<br>Met | Predicted<br>Not Met | |----------------------|------------------|----------------------| | QoS Bound<br>Met | 78.8% | 2.1% | | QoS Bound<br>Not Met | 2.2% | 16.9% | MISE-QoS correctly predicts whether or not the bound is met for 95.7% of workloads # Performance of Non-QoS-Critical Applications When slowdown bound is 10/3 MISE-QoS improves system performance by 10% ## Summary: Predictability in the Presence of Memory Bandwidth Interference - Uncontrolled memory interference slows down applications unpredictably - Goal: Estimate and control slowdowns - Key contribution - MISE: An accurate slowdown estimation model - Average error of MISE: 8.2% - Key Idea - Request Service Rate is a proxy for performance - Leverage slowdown estimates to control slowdowns; Many more applications exist ## Taking Into Account Shared Cache Interference #### Revisiting Request Service Rates Request service and access rates tightly coupled # Estimating Cache and Memory Slowdowns Through Cache Access Rates ### The Application Slowdown Model $$Slowdown = \frac{Cache\ Access\ Rate\ {}_{Alone}}{Cache\ Access\ Rate\ {}_{Shared}}$$ ### Real System Studies: Cache Access Rate vs. Slowdown ### Challenge How to estimate alone cache access rate? #### **Auxiliary Tag Store** #### Revisiting Request Service Rate Alone Revisiting alone memory request service rate ``` Alone Request Service Rate of an Application = # Requests During High Priority Epochs # High Priority Cycles - # Interference Cycles ``` Cycles serving contention misses are not high priority cycles #### Cache Access Rate Alone Alone Cache Access Rate of an Application = # Requests During High Priority Epochs # High Priority Cycles - #Interference Cycles - #Cache Contention Cycles Cache Contention Cycles: Cycles spent serving contention misses Cache Contention Cycles = # Contention Misses x Average Memory Service Time From auxiliary tag store when given high priority Measured when given high priority ### Application Slowdown Model (ASM) $$Slowdown = \frac{Cache Access Rate Alone}{Cache Access Rate Shared}$$ ## Previous Work on Slowdown Estimation - Previous work on slowdown estimation - STFM (Stall Time Fair Memory) Scheduling [Mutlu et al., MICRO '07] - FST (Fairness via Source Throttling) [Ebrahimi et al., ASPLOS '10] - Per-thread Cycle Accounting [Du Bois et al., HiPEAC '13] • Basic Idea: $Slowdown = \frac{Execution Time Alone}{Execution Time Shared}$ Easy Count number of cycles application receives interference ### Model Accuracy Results Average error of ASM's slowdown estimates: 10% ## Leveraging Slowdown Estimates for Performance Optimization How do we leverage slowdown estimates from our model? - To achieve high performance - Slowdown-aware cache allocation - Slowdown-aware bandwidth allocation To achieve performance predictability? ### Cache Capacity Partitioning Goal: Partition the shared cache among applications to mitigate contention ### Cache Capacity Partitioning Previous way partitioning schemes optimize for miss count Problem: Not aware of performance and slowdowns # ASM-Cache: Slowdown-aware Cache Way Partitioning Key Requirement: Slowdown estimates for all possible way partitions Extend ASM to estimate slowdown for all possible cache way allocations Key Idea: Allocate each way to the application whose slowdown reduces the most #### Performance and Fairness Results Significant fairness benefits across different systems #### Memory Bandwidth Partitioning Goal: Partition the main memory bandwidth among applications to mitigate contention # ASM-Mem: Slowdown-aware Memory Bandwidth Partitioning Key Idea: Allocate high priority proportional to an application's slowdown High Priority Fraction<sub>i</sub> = $$\frac{\text{Slowdown}_{i}}{\sum_{j} \text{Slowdown}_{j}}$$ Application i's requests given highest priority at the memory controller for its fraction ### ASM-Mem: Fairness and Performance Results Significant fairness benefits across different systems ## Coordinated Resource Allocation Schemes - 1. Employ ASM-Cache to partition cache capacity - 2. Drive ASM-Mem with slowdowns from ASM-Cache #### Fairness and Performance Results Significant fairness benefits across different channel counts ### Other Possible Applications VM migration and admission control schemes [VEE '15] Fair billing schemes in a commodity cloud Bounding application slowdowns ### Summary: Predictability in the Presence of Shared Cache Interference - Key Ideas: - Cache access rate is a proxy for performance - Auxiliary tag stores and high priority can be combined to estimate slowdowns - Key Result: Slowdown estimation error ~10% - Some Applications: - Slowdown-aware cache partitioning - Slowdown-aware memory bandwidth partitioning - Many more possible # Future Work: Coordinated Resource Management for Predictable Performance Goal: Cache capacity and memory bandwidth allocation for an application to meet a bound #### **Challenges:** - Large search space of potential cache capacity and memory bandwidth allocations - Multiple possible combinations of cache/ memory allocations for each application #### 18-447 # Computer Architecture Lecture 31: Predictable Performance Lavanya Subramanian Carnegie Mellon University Spring 2015, 4/15/2015