This week we were able to get more insight on what our CNN design should look like, feedback from our professors and TA. From this new direction we looked more into encoder/decoder style and got a better understanding of what we should do. We were also able to look more into a deblurring metric to ensure if we need to deblur a video before we started post-processing it. After looking at some more papers regarding this topic we decided that we are going to be prioritizing speed but not at the expense of terrible video quality, so we are going to be looking at the gradient of the image to determine initially if the image has a blur or not. If the gradient suggests there is no blur then we will pass that frame along and not touch it, but if there is some blur we are going to have to classify it, this is where things got more complicated. There are plenty of great papers that approach this problem, but they do so in a computationally expensive way and so far the best approach is to create another CNN for classification. The only problem is that this would mean more training data and possibly more processing time which could bottleneck the entire system. If our deblurring CNN is faster than we expect we are considering just running the model on every frame whether it is blurred or not, but this is a metric we cannot measure at the time because our deblurring CNN model is running at the moment.
Currently our biggest risk is the camera we preselected. We decided at first to go with the e-CAM30_CUNANO, that can film at 1280x720p @ 60fps, which we planned to downsample to 30fps, however we ran into a problem with ordering it on Amazon. So currently we are in search of an alternative. As a contingency plan we have looked into both webcameras which are compatible with the Jetson Nano, and other compatible cameras that can work with the Jetson Nano. This is a crucial part of our project since it is the main source of input and it will be the data we will run our post-processing algorithm on. Additionally, we were going to create our own testing sample to decipher if out deblurring metric was accurately choosing videos that indeed had spatially invariant blur.
We have made progress in terms of our software backend. We began writing initial/foundational code to take in video and help organize the input of image frames from a camera (done using the built in camera of our personal machines). This is being done right now by using a list local to one thread’s main method to store incomplete frame packets, once a frame packet has been completed (right now containing 3 frames) it is moved to a global list, accessible from another thread that is responsible for dispatching frame packets for processing with our CNN. Currently we then dispatch these complete packets in batches of size K-2, where K is the maximum of hardware concurrent threads that whatever machine we are on can handle (done with a call using the multiprocessing module in Python). This is likely to change as we feel that we will likely find that we will reach a bottleneck in the amount of CPU/GPU usage reaching its maximum without utilizing all K-2 threads, but we will experiment with this further and see what the most optimal setup for this part of the system is. Beyond that, we still have to setup the best way to then put the post processed images back together and store/save them to our SD card, the SD card step though I think will wait until we have one since our current research into doing this has turned up very little and our time is likely better spent accomplishing other tasks that we have better grasp on how to handle currently.
0 Comments