This week we worked on identifying & fine tuning various metrics to define video resolution quality. We have identified VMAF as being too slow for the latency constrained environment that our solution will be operating in. As a loss function for the convolutional neural network we are building, we found that the latency hit is too large for a real time application like this one. With that in mind, we have identified SSIM as a viable heuristic for quantifying image quality. SSIM is largely based off mean-squared error & uses the algorithm to model & quantify discrepancies between video frames.
We benchmarked the VMAF algorithm on two identical videos at 1080p and this took roughly 1 minute and this is significantly too long to use for a loss function in a real-time ML pipeline for video upscaling. Hence, we are going with SSIM & a mean squared error approach for the loss function in this system. We have benchmarked SSIM and it fits the threshold defined for a max latency for a loss function. We are going to go with this heuristic as a measure of upscaling quality for our deep learning based approach to doing video upscaling in real time.