The highlight of this week for me was delivering the Design Review presentation. I think I did quite well for the presentation, with my teammates and Professor commenting that the presentation was polished, with good graphics and content. We are currently working on the design report, and will use the content of the design presentation to write it.
On the technical side, I was able to make significant progress on the PyTorch port for the CV application, which was coded in C++ so that it can run optimized code on the Jetson, for maximum speed. The C++ application is now able to run the trained PyTorch model and spit out a set of predictions for the top 5 highest probability classes.
There were several challenges with the porting process, including the lack of good PyTorch documentation, which resulted in it being very difficult to figure out how to properly convert between C++ formats (cv2 to torch Tensor for example) and also important considerations in ensuring that the model can be serialized properly for running in C++ (in particular, no nested functions etc.). This is a lesson learnt on the importance of good documentation, and the pain/need of having to pour through various forums and articles as a result.
However, after training and testing the network, I began to realize big problems with the trained model. Most notably, the model failed to produce correct predictions. After consulting with Prof Mario’s PhD student, we realized that we were using a highly customized model that was not designed properly, and was not even a proper “ResNet” (lacking fully residual layers). To this end, he advised us to use other preexisting models like ResNet18 or AlexNet. This is a lesson learnt as to not blindly copy code over intern
Next week, I will focus on trying to train our data on either ResNet18 or AlexNet, as well as test it in the new C++ classifier. (There is also a Python one for quick testing, in case the C++ one still has bugs). Hopefully I will be able to train up a network that will achieve our desired accuracy of 85% (the network itself should reach about 95% validation accuracy).
Fortunately, despite this setback, we are currently still on schedule, because we were previously ahead of schedule with the successful porting of the C++ application.