So this week, we looked at the various compute infrastructures we would use to train our CNN-based video upscaling algorithm. We have isolated the compute to be either an AWS SageMaker instance or a private gpu cloud that AWS offers as well. This will enable model training to take place much more efficiently, and so we can then take the trained model and write it directly on an FPGA for cycle level optimization. Without a hardware based fpga implementation, the convolution and gradient descent operations would take a significant amount of cycles on a Jetson or other embedded platform. We believe that writing it directly in hardware will significantly improve latencies of inference particularly for this task. It’s more of an exercise in ASIC engineering & hardware design coupled with machine learning.
Joshua’s Status Update for 9/25/21
This week, we went to class and peer reviewed our classmates’ presentations, as well as receiving feedback and Q+A from our peers, TAs and professors. I also followed our schedule and completed my main tasks of familiarizing myself with the VMAF documentation, as well as sorting out the details of our model selection/training with my team during meetings. We discussed specifically the dataset that was best for our training, and also our backup plan of using an existing algorithm if our CNN doesn’t end up working within a reasonable amount of time, since the main portion of the project is hardware – specifically, the implementation of writing the algorithm onto an FPGA to hardware accelerate the upscaling process.
Looking at the VMAF documentation, I confirmed that the current version has a wrapping Python library that has good documentation. I had initially decided to use Python since it was the most intuitive option for ML, and I could also have Kunal helping with the beginning of the algorithm development, since he also has some experience with ML, and is eager to be involved. Also, the Github on VMAF has some very useful scripts that can be used during our training process, and I also briefly considered the sample datasets that they provided, but I discussed with my teammates and we decided that the dataset from the CDVL (Consumer Digital Video Library) fit better, since it had a greater variety of videos, such as animation.
For next week, I will begin to work on the model development in Python with Kunal, as well as working on the design presentation as a team. I will have to further consult the VMAF documentation, as this is my first time using it, as detailed on the schedule.
Team Status Report 9/25
For this week, we followed our schedule/Gantt chart and attempted to do every thing on our list. Almost every single task was accomplished successfully, with details listed on our personal status reports, with the only hiccup being our AWS setup – the credits will be acquired by next week, and we have already looked into which AWS instance type would be most suitable for our group.
During class time, our entire group also performed peer review on all our classmates. We decided that Kunal would be presenting for the second presentation, and we also aim to address all of the concerns that were addressed during the Q/A session of our first presentation, as well as all feedback we received from TAs/professors through Slack.
As a team, we further discussed the model for our project, as that is a core part of the upscaling process. Reflecting on the feedback from Slack, as well as following our schedule, we also decided on a specific dataset and acquired the videos online from a database that provides it.
Looking towards next week, we are on track according to our schedule, and optimistic of continuing our positive trajectory. Next week, we begin trying to implement I/O with our acquired hardware, as well as preparing well for our design presentation. We will also begin writing code for the training part of our project in Python.
James’s Status for 9/25
In the end of last week I practiced for the presentation after we finished the presentation. After giving the presentation, I acquired the Ultra96 and its peripherals. I’ve been toying around with it and getting it up and running in the time since.
As per our gantt chart schedule, I feel that I am personally on track so far.
Joshua’s Status Update for 9/18/21
This week, I worked on refining the details on our implementation with my team. I looked into possible algorithms that could be used on our project with my team, and also furthered my knowledge of upscaling algorithms that were used in the past, specifically on browsers, since our initial use case was video/movie upscaling. After changing our use case, I looked at groups from previous semesters who had similar projects to ours, and compared their numbers to our estimates for throughput and image/video processing on FPGAs. I also looked into the process of down-scaling videos from a higher quality, as this will be an important part of the testing for our project.
Since I fell ill on Wednesday and did not recover for 3 days, I did not meet with my team on that day, and did not spend as much time on research as I had initially planned. However, we still managed to achieve most of our initial goals for the week, and my team and I have communicated and managed to bring me up to speed with our progress into the research of our project.
I also helped write the introduction of the project and setup the tabs needed for our blog/website, as well as compiling the progress of my team to write the team status report.
Team Status Report for 9/18/21
This week, my group met with our TA, Joel, and Bryon to refine our abstract and pinpoint the finer details of our implementation. Taking on their recommendations, we decided to change our use case, specifically, to security/video streaming. Allowing to up-scale videos on demand in real-time allows for greater security and better decision-making when the user is presented with potential threats. Also, client-side upscaling of videos can make up for poor internet connection. After considering the throughput and use case, we also decided to go with 24fps as a target instead of 60fps, as this is more realistic whilst still perceivable as a video to the user.
As Byron suggested, we examined existing upscaling methods used in browsers more in-depth, as well as reading up on some DSP literature that he sent us. We decided that neural net methods were indeed more useful for our implementation, and we are in the process of figuring out how to fit this architecture onto an FPGA.
For the coming week, we will further develop our schedule, and also confirm how we will procure the key components of our project. We will also setup team infrastructure such as Github to ensure we can coordinate our progress better. Overall, we are on schedule and ready to progress to the next week.
Kunal’s Status Update
This week our team worked on pinpointing an algorithm for usage in the real-time video upscaling problem. We found that DSP algorithms approach the problem in a rather naive way, as they’re unable to scale out to different form factors for the video data. The inputs to the image upscaling problem are uniformly distributed but often vary in slight ways on each iteration, and hence a deep learning based approach is favored.
The deep learning algorithm I looked into was image super-resolution from sparsity. This algorithm covered how we can take batches of pixels from a low resolution image and build out 2 matrices representing a downsampling & blurring filter. The deep learning algorithm would be based on a classical layered neural network taking in pixel densities and locations as inputs. This algorithm will then train two dictionaries both representing a sparse coding for the image upscaling algorithm. Two dictionaries for both the low resolution and super-resolution images would then be correlated and through the iterative process of gradient descent we can figure the appropriate heuristics for the trained model.
James’s Status for 9/18
This week I did research on other image upscaling methods in recent literature as well as in classical DSP modes, honed in on an algorithm to pursue & reasoning behind it. The DSP ways seem to be lossy with respect to high frequency information or much more involved with respect to mathematical processing that would parallel computation going into convolutional net ways. Most modern upscaling being researched currently uses neural net methods to upscale images, and we are going to go down this route as well.
I also did work on the outline of the presentation; there are still a few parts that need to be finished up shortly. Tomorrow’s plan is to get the presentation done and ready for the presentations this week.
Lastly, I also got this blog up with our project name and member names.