Currently, the most significant risk that can jeopardize our project is that we may not be able to separate the speakers well enough for the Speech to Text model to produce usable captions. We spoke with Professor Sullivan about our circular microphone array, and he strongly recommended the use of a linear array for our application. There don’t seem to be any great options for prebuilt linear arrays online, as we could only find one specifically for the Raspberry Pi. The estimated shipping time for that array is a month, so for now we plan to continue working with the UMA-8. If the UMA-8 is too small for both beamforming and STFT, we will have to try building our own array out of separate microphones. This approach will add cost and potentially take a lot more time. None of us are familiar with the steps involved in recording from multiple microphones, so we hope to avoid that complication.
One of the main changes we made from the proposal presentation is the use of a Jetson TX2 for all of the processing. We wanted to limit the amount of data movement that we would have to deal with, and the Jetson TX2 also provides consistent processing and I/O capability compared to the variability of the user’s laptop. Another key design choice we made was to use an HDMI to USB video capture card to transfer our final output to the user’s laptop. We based this off of the iContact project from Fall 2020. Both of these changes should greatly simplify our design and allow us to focus on the sound processing.
Our schedule remains pretty much the same as the one presented in the proposal presentation. Instead of having to worry about circuit wiring, however, we now just have to deal with the video capture card.
We were able to successfully use the TX2 to interface with the webcam and UMA-8 through a USB hub. We have now started to work with the video and audio data of what we hope to be our final components.