This week, I worked on writing and integrating the last few parts of the deep learning and non-real-time approach to our project. Instead of my previous approach of combining OpenCV and PyAudio into one Python script, I simply used GStreamer to record audio and video at the same time. I also extracted the timestamped captions and overlaid them onto the video without too much trouble. Example below:
I am currently trying to install all of the relevant libraries into a Python 3.8 virtual library. Running the deep learning solution requires Python 3.8, and I figured that moving everything to a Python 3.8 virtual environment would make later testing and integration a lot easier. As I mentioned in my last support, some components required Python 3.7 while others were installed on the default Python 3.6.
Using GStreamer directly on the command line instead of through OpenCV means that we do not have to compile OpenCV with GStreamer support, which is convenient for our switch to Python 3.8. I have not finished building PyTorch and Detectron2 for 3.8 yet, but so far I do not anticipate any major issues with this change.
I am currently slightly behind schedule, since I wanted to have some thoughts on how we would build a real-time implementation by now. Given the amount of time left, a real-time implementation may not be feasible. This is something we envisioned from the beginning of project, so it does not significantly change our plans. In the context of working only with a non-real-time implementation, I am on schedule.
By next week, I hope to have every component working together in a streamlined fashion. We should easily be able to record and produce a captioned video with a command. Instead of focusing on real-time, we may pivot toward working on a nice GUI for our project. I hope to have worked on either one or the other by the end of this week.