James’s Status for 11/13 – Team B0: Real Time Video Upscaling

First off, I’d like to apologise for the lack of a status update on the previous week (nothing posted on 11/6). I was extremely busy getting our hardware working for the interim demo. For the sake of coverage and good documentation, I will include what I would have had in that update here, clearly backdating and marking entries where applicable. I’ve included an end-of-week overview of last week, daily reports for each day this week, and an end-of-week overview of this week. I’ve decided to add daily reports for myself now for two reasons: 1) to keep myself accountable for making regular progress on the projects, and 2) because reaching this stage of the project, I have a lot more to do than previously and hence want a better way to organise it.

———-

End-of-Week Update: (10/31-11/6)

This week I got hyperparameters back from Josh, so I was able to get the CNN built on the Ultra96. Unfortunately, because he was still recovering from illness, I didn’t get the hyperparameters back as early as I’d have liked, and so was not able to run all the experiments I wanted to this week. One big takeaway I found when building the full size of the model was that I didn’t fully appreciate the size of the model before this point, and so didn’t realise that each hidden layer (at least as implemented now) has to have calls back to DRAM. This may cause slowdowns but I haven’t had the chance to benchmark this yet. This is a big TODO for the upcoming week. Also because of the size of the model, it causes builds to take a very long time to finish synthesizing and routing, around 30 minutes for an incremental build, far longer for a clean build. Development in this sized environment will be far slower than I anticipated just due to this turnaround. As of now, I just have the model hyperparameters, no weights, but the model I have implemented on the FPGA is agnostic to the weights, they will just be loaded from a file by the host. There could be improvements based on precomputation but I’m not sure if this is actually the case. I would have to do a cost benefit profiling for how much computation / memory accessing it would actually save. At the same time having the model agnostic to the weights gives more modularity to our system which is very good for a short turnaround testing environment like what we have. In the coming days we will need to get the system partially integrated for demo, and then keep moving forward with progress on the rest of the coming week.

Daily Update 11/8: (Interim Demo)

I did integration this weekend and ran into a great deal of immediate issues, especially with the timeline of the interim demo being so soon. The first issue I ran into was finding decent data sources. So for expediency and a proof of concept of getting video from the host to the fabric, I wanted to store a video on the home directory of the board’s file system, but couldn’t get them to play nicely (issues with file formats, dimensions, file size, and so on). In the interest of time, I reverted to using an mp4. After our first demo I will ask Josh to share the data set so we have better/more applicable files to use. The size of the files will also be less of a problem since it will live on a USB as opposed to on the same microSD on which the image of PetaLinux lives. The second main issue I ran into was that the code that Kunal gave was riddled with bugs and errors. In order to fix it, the most clear and effective path forward was to rewrite the entirety of the host code. This ended up being a bit painful in linking the correct OpenCV libraries with Vitis, as the project file does not store the config for the build in an obvious way, but in all did not end up being as painful as it could have been. The host code (for demo) took a few hours to write, debug was minimal as I made sure to code carefully as builds/compilations are quite expensive. Another thing to note is that, for the demo and only for the demo, I reduced the sizes of the filter maps to have shorter builds and hence a faster iteration cycle to make sure there was a live demo available as a deliverable. I ended up achieving this with a much reduced spec (as expected for interim demo) where the host reads a video file with known file path and name, launches the kernel on the fabric, reads back the data, and serialises this to a file. Moving forward, we will want to send data to video output on the miniDisplayPort as opposed to serialising. We will also still need benchmarking added, both for accuracy and time. Lastly, just with wall clock time, it seems like serialisation takes an untenable amount of time (few seconds). We will need to investigate if this is also the case for streaming video and make sure this time does not act as a bottleneck for us.

Daily Update 11/10:

I re-integrated the correct input/output map sizes to the FPGA. The builds still take ~30 minutes. I want to find a better way to iterate on the full design that doesn’t take as long for a build, but at the same time I don’t want to devote too much time to something that might not amortise out. If I’m being honest, with the runway we have left, I don’t think that it will be worth it, and so will not devote that much time to optimising builds. I plan to block out three hours tomorrow to try and improve the iteration cycle, if nothing comes of it, so be it, I’ll just need to be careful with every build I do.

Daily Update 11/11:

Because of what Tamal told us yesterday in the interim demo regarding static discharging on the U96, I began looking into cases for the U96 that we could use to mitigate the risk of discharging due to touching the components of the device. I didn’t find many existing options, just one 3D-printable model on thingiverse, linked here. The main drawback with this model is that it includes space for the JTAG/UART extension, which we aren’t using, and so would be more bulky than what we want/need. I might look into modifying this model so that we can have a case with a better form factor. At the same time, however, I’m not sure if I have the bandwidth to add this to all the other tasks which I need to complete as per our schedule. I plan to leave this as lower priority – it wouldn’t be the worst thing in the world if we had the extra space for the pod – but also I’m planning to ask my group if any of them have more bandwidth / more experience with CAD / 3D printing

Daily Update 11/12:

I didn’t get much work for this class on Friday, mostly focused on deadlines I had in other courses.

Daily Update 11/13:

Again had other coursework to attend to during the day. Today in the evening, I’m running some benchmarks on the CNN kernel so I can get a sense for how much further I need to push it. I won’t have numbers in time for this update’s due date, but will have them later on tonight, past midnight.

End-of-Week Update/Overview: (11/7-11/13)

This week was fairly productive – we have a full(-ish) system, we just need to flesh it out and iron out some kinks. The build times, in retrospect, should not be a huge issue, I’ll just need to be smart with what I run, plus it’s good practice for industry codebases and learning the lesson that compiles are not always free. The case has kind of taken a place on the back burner for now, it would be a nice convenience, but not something which we would need for MVP. With tonight’s profiling and getting some readings done, I should be ready to start iterating in earnest and with a more solid goal to reach. At this point, I am fairly confident that I can get my part done on time or ahead of schedule. I may attach an update to this after due to include results from benchmarking that finishes late into the night so that coursestaff can review it before Monday.

Leave a Reply Cancel reply