This week I worked on the processor utilization percent based traditional load balancer as well as response time calculation for our custom load balancer. We originally planned to get processor utilization metrics from AWS CloudWatch, an interactive monitoring tool for AWS EC2 instances. Unfortunately, I had issues trying to collect data in increments shorter than 5 minutes and such a delay would be excessive for load balancing decisions that happen in seconds. My current approach is to calculate processor utilization directly in our video server code and send it as an additional request header for the LB to parse. I may have to adjust how often the parsing happens based on how much load this creates.
On the response time end, we had to significantly change our plan for how to collect response time due to our new understanding that expecting this information from users, simulated or otherwise, is infeasible. Instead we plan to now get the response time from when the load balancer chooses a server to request to when said server sends its corresponding response. Specifically, the load balancer would send the start time as a request header, and the video server would include response time as a response header.
I also documented a more refined initial approach for our custom load balancer. Because volume of datapoints is not a good metric for choosing a server (old data is far less useful in load balancing than in traditional multi-arm reinforcement learning), I find that epsilon-greedy is a better starting point algorithm than UCB1. Furthermore, to ensure recent data is most valued and to reduce memory concerns, the algorithm will remove datapoints prior to a certain point (e.g. only most recent 20 response times are kept for decisions). I also plan to limit the algorithm from choosing one of the last k-chosen servers again. This accounts for delays in number adjustment while also promoting dynamism in server exploration and monitoring.