Larry’s Status Report for 23 April 2022

This week, I worked on finishing the integration of all the various subsystems that we developed. One challenge I encountered was that when I tried to combine everything into one Python script, the system ran out of memory. For now, I think we will split the system  into several Python scripts that are called from one shell script, and avoid recording for more than ~20 seconds. So far, this strategy seems to be working, and I am almost surprised at how well everything functions. The caption accuracy leaves something to be desired when two people are overlapping their speech, but otherwise the whole system is usable.

Here is a video sample with fully overlapping speech:
https://drive.google.com/file/d/1MmEE7Yh0Kxe5wChuq5n5rHsnGKMOyZYr/view?usp=sharing

One thing that still needs work is keeping the captions within the video frame boundaries and away from each other. I will clean up this issue next week, and do not anticipate much work involved. The other main deliverable yet to be finished is the integration Charlie’s website onto the Jetson TX2. Additionally, we ordered extra microphones that we need to test with our setup as well. Finally, the system currently only uses the deep learning approach to separate speech. It would be interesting to try to overlay captions that are generated using the signal processing approach. Once all that is done, we will have a finished product.

So far, the project is on schedule. I believe we left enough slack time and planned enough contingencies to produce something usable for the final demo. It is possible that we struggle a lot with the website, in which case we could rapidly develop something that works locally. Charlie seems confident in what he has developed, however, so we probably won’t need to change our plans.

By next week, I hope to have the deliverables I mentioned above done. I will also be helping Stella with the final presentation.

 

Leave a Reply

Your email address will not be published. Required fields are marked *