joshual2 – Team B0: Real Time Video Upscaling

December 13, 2021December 13, 2021

Joshua’s Status Report for 12/4

This week, I made several adjustments to priorities before the end of semester. Firstly, James was making good progress on the implementation of FSRCNN-s, which meant that getting weights for the model was integral for our final demo and deliverable. Taking into account the potential worst-case scenario, I did research on publicly available weights for our CNN model, and did comparison between the dataset they used to see if they would be suitable, in the possible event that the training for our new model could not be done in time. After talking with the team, I reevaluated our priorities realized that finishing and printing our CAD model would have to be put on the backburner – figuring out hyperparameters and looking through possible solutions with James was much more impertinent, in order to meet our deliverables by the end of the semester.

In terms of the final demo, I voiced several ideas on the structure of how we could present, including a multi-monitor setup, as well as interaction with the listeners, such as allowing them to rate which video they thought was the better one, as well as prompting them to give a subjective score, etc.

I made the final presentation and also prepared to deliver it. In doing so, I also worked on the final report, detailing a lot on our testing and metrics on our initial model, even though we’ve moved on from that one, as to fully document our developmental process.

In terms of the final video, I discussed with my team on what approaches we could take, and since none of us really had any video-editing experience, we decided to prepare in advance. I looked at videos from previous Capstone courses, specifically the winners of last semester, as I thought their video was correctly paced and well-edited.

(This report was made late, and was added on December 12th.)

November 22, 2021

Team Status Report for 11/20

This week, our group spent a lot of time working on getting a working version of our upscaling model on the FPGA. Since we had several delays in the previous week, we had to work hard to catch up on lost time, and we are making a lot of progress each day in order to meet our deadlines.

James worked on optimizing the latency on the U96 board, as well as improving on the FPGA architecture of the CNN. He ran into several problems near the beginning of the week, but quickly caught up and is in the middle of addressing all the numerous limiting factors that are preventing our upscaling model from working.

Joshua worked on addressing problems with the end-to-end latency by training a smaller model from scratch, since there were unexpected issues with the end-to-end latency on the U96 board. He was also in charge of starting the final presentation, as he will be presenting, as well as continuing work on the final report. He also did further research on a case for the U96 board, something that James had an initial design for last week, but was for an older generation of the board.

Kunal is working more on the I/O portion of the board, by looking at the different ways the frames can be passed in to increase the speed of implementation.

Next week, we are looking to have a final product working, even if it doesn’t fully meet our initial requirements. From the work we’ve put in these last few months, it seems that some trade-off is inevitable, and this week, we will pinpoint exactly which trade-off we are willing to take, and look to have at least a semi-working demo in preparation for our final presentation on 11/29.

November 21, 2021November 22, 2021

Joshua’s Status Report for 11/20

As the semester is wrapping up, this week was very busy in terms of workload and addressing issues that came up. At the beginning of the week, I was doing research on CAD designs for our FPGA board – a kind of holder that wraps around the board to prevent static discharges, something that my team and I had looked into last week. Since none of us on our team really had much CAD experience, it was a process which was more time-consuming than expected. I ended up asking a friend who had more CAD experience that me to teach me, and I had a design finalized for the board.

However, our group quickly ran into a bigger problem, which was that our end-to-end latency was a lot higher than we had initially calculated it to be. The main problem was that although our throughput was close to meeting our requirements, our end-to-end latency is significantly higher than what we wanted for it to be ‘real-time’, i.e. a delay of less than 30ms. To address this, a smaller model was needed, but that would come at the cost of quality for our final upscaled videos. We had to decide whether or not the sacrifice to quality was worth it, or take the hit in terms of latency and justify why that was the better choice overall.

I talked through the problem with James and Kunal, and we were thinking about ideas such as training a single-layered CNN from scratch, which was still possible with the time we had left but cutting it very close, or looking at other solutions. In the end, I prepared a smaller model which was still had 3 layers, but with drastically less filters, and started training that in parallel with our other initial model, by training the new model on my GPU and having the training for our initial model continue on Google Colab. I aim to fully address this issue in the coming week, and coordinate with James to ensure that we’ll have at least a final working product, even if it doesn’t meet our initial requirements.

In terms of the final presentation and the final report, I’ve started working on the former and continuing work from last week on the latter, since I will be the one presenting next week.

November 14, 2021

Team Status Report for 11/13/21

November 14, 2021November 14, 2021

Joshua’s Status Report for 11/13/21

This status report will be split into two parts: The first part will highlight the what happened last week, i.e. 10/31-11/6, as my status report was missing for that week, and the second part will cover my report for the week of 11/7-11/13.

For the week beginning on 10/31, I was extremely sick from 10/31-11/3 and could not get much work done nor communicate with my teammates. When I got started on Thursday, my group and I got together to reevaluate our situation, and also worked on presenting for our interim demo. I was responsible for the software demonstration, which involved showing our working software model that could upscale videos from our dataset. I had several demonstrations in mind, one that would output a side-by-side comparison of the videos from bicubic upscaling, CNN upscaling and also the original/native resolution video in real-time. Another demonstration involved a live webcam feed, which would show an upscaled version of our webcam in real-time that involved less filters. This was to demonstrate that although a real-time upscaling tool on software/GPU is possible, it was still lacking compared to the CNN model with more filters, which is not possible on software but possible with our hardware/FPGA implementation. This did not end up working, so I resorted to upscaling still images and showing the difference in SSIM that demonstrated the difference between the number of filters trained on our CNN implementation.

For the week beginning on 11/7, we started the week off with our interim demo. I demonstrated a clear difference in SSIM between a standard bicubic interpolation upscaling method and our software-based implementation of a CNN upscaling method on a video from our dataset. Due to some problems with throttling, I had to resort to showing videos I had previously upscaled instead of demonstrating my model working live.

I did some research to see if a dataset involving still images would end up working better than our initial dataset of frames from videos. The reason is because our upscaled models seem to output blurrier frames and videos, which is subjectively better than the artifact-heavy upscaled videos through bicubic implementation and objectively better in terms of SSIM, but still unusual. I hypothesized that since a lot of images used in our dataset are blurry, it may have somehow caused the CNN to output blurry sets of pixels when they aren’t in the native frames. After discussing with the team and examining our timeline, I decided to explore using other datasets, ones that had still images which weren’t blurry natively. We were able to fit this in because we already had our training infrastructure setup with Google Colab and my local machine. I decided on a dataset used by another open-source upscaling implementation, but one that was aimed on implementing single-image super-resolution instead of video super-resolution. Using the same hyperparameters and through 1-2 days of training, the initial results seemed to yield higher SSIMs on the still images, but when tested on our video datasets, artifacting was a lot more apparent, and the SSIM fell short of my initial model. So, I ended up dropping that idea and continuing to train our previous implementation.

I also discussed with James the current limitations/bottlenecks of our hardware approach, specifically, if there was anything that could be done and tested on the software side that would help with the runtime of the hardware model. He suggested making minor tweaks to the hyperparameters, specifically the number of filters in our second CNN layer, which could, potentially, massively decrease the time taken for the implementation to work with only a very minor decrease in quality of the upscaled video. I also discussed potential issues with the processing of each pixel, specifically, the encoding involved, as the paper was only processing one layer of data, specifically, the luminescence portion (Y) of the YCbCr encoding. There were different encoding methods shown which could potentially reduce the amount of data being processed on our FPGA, which was significant because James mentioned that it was a potential bottleneck for our system.

Minor additions: I added content to our website, as well as getting a start on our final report as to avoid our situation with our previous design report. It was also helpful to state and write down exactly what changes we had put into place since our design report, to double-check that our project has stayed on track and hasn’t deviated from our initial requirements.

November 1, 2021November 1, 2021

Joshua’s Status Report for 10/30/21

For this week, I set out to accomplish two main tasks – Addressing our problem with AWS, which would allow the training process for our CNN to begin on time, and also expanding on our previous research on pre-trained models. This is to make sure we have one ready to use if either our software model either isn’t ready on time or that it fails unexpectedly, so that we can still submit a final, working product.

In terms of the research, I worked with James to research various pre-trained models online. As we had found out initially, a lot of pre-trained models that are based off the paper we are using don’t follow the method stated in the paper exactly, and use a lot of filters instead of CNNs as their main upscaling method. A surprising number of them also throw in filters such as line-sharpening and and anti-blurring filters, which greatly increase computational time and hence cannot be realistically done in real-time. We eventually found an open-source version of the SRCNN implementation on Github in Python, which uses a CNN, but is only rated for up to 4x the upscaling. This will detract slightly from our initial goal of 4.5x upscaling, which we had determined to be achievable, but it would still be viable to be put on hardware to show the acceleration that is possible from our FPGA implementation. The dataset they use is different to ours, since it was mainly used on still images instead of key frames of videos, but it is still a relevant dataset as it has the characteristics we chose for our dataset – variety of shots, close-up vs zoomed out, nature vs still objects etc.

To address the concerns with AWS, immediately after our meeting, Kunal double-checked his AWS and found that the request had actually already been approved – he had just missed it. Despite this, the request came back insufficient, as the wrong number of vCPUs had been provided to allow us to use our chosen instance – a P3 instance required 8 vCPUs, whereas Amazon only provided us with 1. After following up on our initial request, they replied within 2 days, stating that they did not have enough resources currently to provide us with the vCPUs needed for a P3 instance, and instead recommended us to go with the G4 instances, which we had actually looked at previously and was our second-best choice.

Concurrently, I also attempted to use Google Colab after the advice from Joel, and there were two main problems – as Joel had mentioned before, the free version turns off after some period of time has passed without any activity, which is a problem. Another big issue is that the storage was very limited and couldn’t fit the dataset we had chosen, which was close to 100 GB. As we were on a tight schedule, I bought the paid version for $10 without requesting, which addressed the concerns, upping the storage to around 180GB making it more than sufficient. The code was running fast enough – after ironing out some bugs, I estimate the model to be fully trained by around Wednesday/Thursday this coming week. Since the code runs well enough on Google Colab, we are no longer using AWS, as Google Colab is also significantly more convenient.

For the coming week, since my role on the software section will be completed, I will be helping James and Kunal where necessary for the integration process.

October 25, 2021October 25, 2021

Joshua’s Status Report for 10/23/21

Last week, a.k.a Week 7, October 11th-17th, I worked on the design review report with my team, and I contributed by writing the parts: Abstract, 5.2, 6.2, 7.2, 7.4, 8, and lastly, formatting and revising our Gantt chart + schedule. As my teammates have pointed out, this design report was clearly a poor reflection on our team as a whole, and there were many other parts of the project that we have explored in detail and should have added to the document, but we simply were not organized and could not meet the deadline, due to other classes, midterms and general lack of communication. My team and I hope this does not reflect extremely poorly on our group overall, as we will be addressing each and every concern that was raised in the design report evaluations and more, and will have a much more refined, detailed and accurate final report + project deliverable by the end of the semester.

To add onto the work I accomplished last week, I have been working on the model locally and ironing out bugs, and there was a very superficial but hard to find problem with conflicting Python library versions that took almost 3 days to discover and solve, but I also ran into another problem with running the code on AWS. Although I had initially made an instance using the free tier before we had received credits, I had eventually refined our design later on and pinpointed the exact instance I wanted to use – the P3 instance that uses the NVIDIA V100 GPUs, which should theoretically speed up our training process much more. Unfortunately, we hadn’t increased our vGPU limit, and AWS did not allow us to create an instance, so we had to apply for it, and we still hadn’t gotten back an answer after a full week.

Fortunately, we had already prepared for such things to happen, and we had deliberately added 2 weeks of slack/extra time to handle such things in our project, so although this software portion is basically a full week behind now due to the above listed complications, I am still confident that I can finish it in time to allow the hardware portion of the project to start and complete on time.

Looking forward to the coming week, I am aiming to finish the software portion by the end of the week, which James and Kunal are heavily involved with, as to not delay their progress. To address the concerns about my lack of work after this week, I will be helping Kunal and James with the integration portion of our project as much as possible, whilst actively participating in the other remaining aspects of the project to ensure it goes smoothly.

October 13, 2021

Joshua’s Status Report for 10/9/21

This week, I spent a lot of time incorporating our new design change of using SSIM instead of VMAF as our metric, by rewriting some of the code I had locally and benchmarking more using SSIM. The end results were very satisfactory – SSIM was a lot faster than VMAF and suited the training portion of our project much better, and I also experimented with the idea of using the previous/next frames as side inputs, but decided against it due to the complexity and, importantly, the little added value of doing that compared to a solely frame-by-frame upscaling method. I also worked on the design review with my teammates to refine the details of our implementation and our project overall.

For the next week, I will spend the majority of my time writing code on AWS, finishing up my model and starting to train it with our chosen dataset. Since I have several midterms next week, I will have to balance my time well and coordinate with my team to make sure that we are working on the project on time.

Note: This individual status report was published late due to a technical issue. The initial draft shows it was created last Saturday, before the due date, but was not published and not fully saved.

October 3, 2021October 3, 2021

Joshua’s Status Update for 10/2/21

This week, our team met up with Joel and Byron, and we discussed our progress on the project. We went into detail about the specific use of VMAF in the training part of the CNN, and we discussed various problems/issues that may arise with the use of VMAF, and came up with several solutions. Byron reiterated the importance of traditional DSP methods, specifically, wanting us to justify and confirm how using a CNN would be superior to those traditional methods, and I incorporated that into our design presentation.

Since we didn’t receive AWS credits until much later in the week, I downloaded our intended dataset and attempted to benchmark VMAF as well as Anime4K locally, a project on Github with similarities to our project, to see how they would perform. Since there were many different videos available on the website where I was getting my dataset (CDVL), I ran into a slight issue with the difference between 1080i and 1080p, as well as the FPS of videos in the dataset, but after discussing with James, I managed to compile a list of videos which were 1080p @30FPS, and worked with my team members to successfully benchmark VMAF and Anime4K. Our development of Python code for training was delayed, since we couldn’t start until much later on in the week, and we intend to catch up on that ASAP as soon as our design presentation is concluded.

I also met up with TAs Joel and Edward outside of class hours to further discuss our project and refine the details in preparation for our presentation. I also wrote the team status report for this week, and worked on the design presentation with my team members.

October 3, 2021October 3, 2021

Team Status Report for 10/2/21

This week, we met up with Byron and Joel again to discuss more about our project, specifically to address any comments from the feedback from our proposal presentation, as well as following up on our initial, first meeting. During the meeting, we addressed the concerns about the use of VMAF as a metric for our training, as well as our dataset and some other things that weren’t fully justified during our presentation. Byron commented on how we have to make sure that implementing a CNN is better compared to traditional DSP methods, and to make sure that implementing something much harder is still the best choice. To that end, we benchmarked both VMAF and Anime4K, a project on Github that does something similar with, specifically, animation, and we obtained concrete, quantitative measurements which we can elaborate on our design presentation to fully justify our design choice.

Joel also raised a good point about how upscaled, lower resolution videos compared to original, native resolutions videos would always result a lower score, and we addressed that by limiting our training to only comparing videos that have been upscaled to the native resolution, e.g. 1080p to 1080p. We also talked about the importance of benchmarking as soon as possible, which we successfully did this week.

Although throughout the week our team members were slightly overwhelmed by work from other classes, we managed to catch up sufficiently by meeting up after class hours and communicating to make sure our tasks were still completed on time. James and Kunal continued their research on I/O, and calculated specific quantitative measurements to put on our design presentation, and I continued my research into VMAF, as well as the model being used for training our upscaling. Referring back to our Gantt chart/schedule, we were slightly behind on developing the Python code for training our own CNN, as we only received AWS credits Friday morning, but we used that surplus time efficiently by benchmarking locally, as well as researching in more detail Anime4K. As per the feedback from Tamal, we are taking the risk of our CNN not working/not being developed in time on the software side more seriously, and our backup plan would be to simply use the CNN implemented in Anime4K and start implementing that on hardware if we cannot get it working on the software side after Week 7. We’ve changed our schedule/Gantt chart to reflect that accordingly.

Looking further into the peer/instructor feedback, we see that they were a lot of comments about the absence of justification for our FPGA selection during the proposal presentation. We’ve focused on elaborating on the choice much more for our design presentation, and we are similarly going into a lot more detail for our software section, as well as our quantitative requirements.

Overall, despite some things not going as planned this week, we believe our team was very successful in overcoming all the problems we encountered, and our initial planning, which allowed for slack time/small delays, proved useful. We look forward to delivering our well-prepped presentation on Monday, addressing all feedback from our previous one, and continued success towards the progress of our project.