So this week, we looked at the various compute infrastructures we would use to train our CNN-based video upscaling algorithm. We have isolated the compute to be either an AWS SageMaker instance or a private gpu cloud that AWS offers as well. This will enable model training to take place much more efficiently, and so we can then take the trained model and write it directly on an FPGA for cycle level optimization. Without a hardware based fpga implementation, the convolution and gradient descent operations would take a significant amount of cycles on a Jetson or other embedded platform. We believe that writing it directly in hardware will significantly improve latencies of inference particularly for this task. It’s more of an exercise in ASIC engineering & hardware design coupled with machine learning.
Kunal’s Status Update
This week our team worked on pinpointing an algorithm for usage in the real-time video upscaling problem. We found that DSP algorithms approach the problem in a rather naive way, as they’re unable to scale out to different form factors for the video data. The inputs to the image upscaling problem are uniformly distributed but often vary in slight ways on each iteration, and hence a deep learning based approach is favored.
The deep learning algorithm I looked into was image super-resolution from sparsity. This algorithm covered how we can take batches of pixels from a low resolution image and build out 2 matrices representing a downsampling & blurring filter. The deep learning algorithm would be based on a classical layered neural network taking in pixel densities and locations as inputs. This algorithm will then train two dictionaries both representing a sparse coding for the image upscaling algorithm. Two dictionaries for both the low resolution and super-resolution images would then be correlated and through the iterative process of gradient descent we can figure the appropriate heuristics for the trained model.