This week I worked on a new algorithm class and analysis of our two algorithmic classes for what components can be optimized. The new algorithm class is a UCB1-based load balancer with a key difference being that exploration is enforced directly on the allowed server choices at decision time rather than being a component of score. This is done by maintaining two data structures; one contains the set of k server indices least recently chosen and the other is an ordered list of n-k servers most recently chosen. After a choice is made, the structures are updated accordingly. The least-recent choice is made via some scoring system that is variable. We will test both a direct best response time choice as well as a random variable with response-time based distribution. The number k is also an adjustable variable between 1 and the server count n.
I also find two adjustable variables of interest in last week’s e-greedy algorithm. Specifically, the random variable of epsilon, which can range from 0 to 1 exclusive, the number of round robin iterations for initial exploration, and the behavior to take in the non-epsilon case. This behavior can be easily tested with both random and round robin in the 1-epsilon cases. Next week I hope to create two additional load balancers that use these algorithms with our new metric of Network I/O (regarding which I will also assist Nakul in the data retrieval portion).
This week, I primarily finished outlining our testing suite so that Nakul can take over making JMeter testing scripts to pass into BlazeMeter. After taking care of the user side of testing, I also started to introduce variables to the VMs we are using for our video servers in order to create different use cases to compare our algorithms on. The VMs can be varied in two different ways: the region they are located in and their specifications such as storage, number of CPUs, and networking bandwidth. As such, I created 4 classes of video servers on AWS and deployed our video server code to them: same region + same specs, same region + different specs, different region + same specs, and different region + different specs. Each algorithm would be tested on all 4 server classes in order to draw meaningful conclusions on how the algorithms perform in different use cases. For example, we predict that our algorithms will perform similarly to RoundRobin when the region and specs of the VMs are similar, but our algorithms will pull ahead when the servers are more varied. In addition to this, I also deployed the code to our custom load balancers to AWS as well.
Next week, I plan on continuing to deploy code on AWS instances as our algorithms evolve. I will also work with Nakul to conduct tests on our algorithms and analyze the data that we gather.
This week, we worked as a group to consolidate and finalize our testing implementation which currently consists of 3 user classes that run 20 parallel users for a total of 60 for a 1 hour testing plan. We also expanded our server architecture implementation plan to include 4 different virtual machine setups. Each of these has either uniform or varied VM hardware specifications as well as uniform or varied geographic locations. All of this will create additional contexts for testing the relative performance of our load balancers in different environments and use cases.
We also have finalized our load balancer algorithm specifications, including what variables in our custom algorithms can be optimized with recurrent testing. Over this next week, we hope to apply our testing suite to the algorithms for both tuning our custom decision-makers as well as doing final comparisons and data presentation. We will compile this final information into our last project deliverables (poster, video, and final report).
This week I was in charge of creating the user test scripts which simulate the behavior of 3 different user types. The first user rapidly switches between all 3 videos on our streaming server, the second watches through the entire length of the video and then switches to the next one, the third watches 20 to 30 seconds of the video and then switches to the next one.
The test scripts were created using Apache Jmeter which is a software package for recording HTTP requests to simulate load. Using their HTTP test script recorder, I was able to record my interactions with the browser to create the test script. The script recorder creates a proxy server to intercept requests being sent in and out of web browser running our video server. This allows me to simulate the 3 different user classes being tested.
I was able to create duplicate threads within the same jmx file. Jmeter allows multiple thread groups to run simultaneously. So I created a new file for randomLB which contains 3 thread groups for each user class and we can now run the tests on the same server at the same time. Duplicated the randomLB file for round robin LB and changed the target domain of all samplers to the roundrobin proxy.
This jmeter file was then run on blazemeter, a software package that allows us to run multiple concurrent users running the script.
This week I intend to complete collection of Network IO metrics retrieved from cloudwatch. I have set up the cw dashboard to collect network packets out data from video server – 1 to 5. I will need to figure out a way to collect this data asychronously in our LB proxy using the cw getDashboard or getMetricsData.
This week, I primarily worked on creating a user testing suite that is suitable for sending realistic requests to our load balancer. Our previous approach of writing python scripts that send specific requests to our servers does not quite match the use case of our video application. We rely on the HTML5 video player element to load the video and request the next chunks accordingly, and simply making video chunk requests at set intervals does not simulate this behavior, especially since different videos will request different numbers of chunks in a certain interval. Therefore, another approach is required to generate load to our servers.
After asking Professor Eppinger for some tips on load testing tools, I was directed to Selenium as a way to script and simulate user actions in a browser. On top of this, I also found third-party tools such as dotcom-monitor, flood.io, and BlazeMeter that run said scripts on multiple machines and compile user data. However, after learning more and experimenting with the technology, I’ve found that while Selenium is excellent at locating and testing for HTML elements, it does not actually keep track of outgoing requests, incoming responses, and the corresponding data. I would need to pivot to another framework to properly simulate those requests.
Thankfully, while working with Selenium, I also noticed another tool called JMeter that deals more with HTML request/response scripts. Initially, I attempted to convert a Selenium script to a JMeter one using a Taurus proxy server tool. However, the resulting JMeter script did not run properly on BlazeMeter. Therefore, I am currently working on creating a JMeter script using the JMeter Proxy.
This week I leveraged the response time header implementation I completed last week as a metric for load balancing decisions. I completed our first custom load balancing algorithm, which is based on a simple solution to the multi-armed bandit problem. Multi-armed bandit is a classic reinforcement learning scenario in which a gambler (decision maker) must repeatedly play one of many slots machines with initially unknown reward probabilities, trying to maximize reward. I find multi-armed bandit to be highly related to the load balancing decision, where the performance of each load balancing decision can be thought of as the slots rewards. The key dilemma is between exploitation (choosing the best current estimated server) vs. exploration (improving estimates for all servers).
The algorithm is based on pattern called epsilon-greedy. It starts with an initialization period where response time info is collected for each server in round robin fashion. Then, a random variable between 0 and 1 is created for each decision. If it is less than epsilon, the the server with lowest average response time (exploitation). Otherwise, the server chooses one of any servers at random or in round robin fashion (exploration). A key difference between load balancing and multi-armed bandit is that server performance is not fixed. Thus, response times are only maintained for the most recent 10 values are retained to ensure data relevance and limit memory needs.
I also wrote an outline for our final presentation slides and am almost finished on another load balancing algorithm based on UCB1 reinforcement. Over the next week, I plan to complete this new algorithm and make two similar ones to the e-greedy and UCB1 based ones by leveraging a new metric that our team is implementing for the video servers, network input/output volume.
This week I worked on collecting OS metrics from our node video server. I had to test different APIs and modules to collect these metrics since some modules were not configured with our particular server deployment. Used node-OS-module to retrieve metrics on CPU usage and network IO. I found later that the node-OS-module netstat was not supported with my OS and so I tested it on different platforms including virtual andrew linux machine as well our deployed EC2 instances. Since this approach was not working for our video server, I started looking into linux netstat which logs metrics on network IO. I was not able to find an API that allowed for easy collection of network IO metrics and thus I finally started looking AWS CloudWatch which is Amazon’s custom metric logging tool for servers deployed on EC2. CloudWatch has a package to retrieve metrics from node servers which I will be using to fetch network IO metrics. the tradeoff of using CloudWatch is that it only allows us to collect these metrics every 1 minute as compared to collecting them every time a response is sent from the video server. This week I will be working on finishing up this metric collection as well as developing a system to visualize our User simulation data.
This week we met several times as a team to specify some important implementation details of our project. On testing, we adjusted our previous baseline of a custom built server that sent HTTP-requests to a plan to use a third-party load testing platform that could run user simulation scripts. This change was made to accommodate the need for realistic user request patterns (requesting video chunks according to the different buffer rates of videos with different specs (dynamism, resolution, etc.) , which would be difficult without direct HTML interaction. Furthermore, a third party would allow us to consolidate user-collected metrics into one platform that can compile into graphs and report this directly. We attempted debugging several different platform implementations (Flood.io, dot-com.monitor, BlazeMeter) and script types (Taurus, Selenium, JMeter) but do not have a fully satisfactory load testing setup yet. We hope to have that early next week.
We also adjusted our plan for video server metric collection. Some analysis of the impact of video server responses showed us that Network I/O is a much more likely bottleneck for our video servers than processor utilization so we now intend to find and retrieve that metric instead for a custom load balancing decider. The metric would either be found via asynchronous background monitoring in our own program or via API calls to an Amazon Cloudwatch monitor. Over the next week, we hope to integrate our testing environment fully and finish three custom load balancing deciders for comparison.