So this week, we looked at the various compute infrastructures we would use to train our CNN-based video upscaling algorithm. We have isolated the compute to be either an AWS SageMaker instance or a private gpu cloud that AWS offers as well. This will enable model training to take place much more efficiently, and so we can then take the trained model and write it directly on an FPGA for cycle level optimization. Without a hardware based fpga implementation, the convolution and gradient descent operations would take a significant amount of cycles on a Jetson or other embedded platform. We believe that writing it directly in hardware will significantly improve latencies of inference particularly for this task. It’s more of an exercise in ASIC engineering & hardware design coupled with machine learning.