Team Status Report for April 3 – Team D6: StenoPhone

This week we started to wrap up our individual modular code-writing and looked towards integration and demoing. As we tested and composed some of the most complex sections of the solution, we had to do some problem solving that led to design changes and improvements.

There are two notable design changes. The first is with regards to audio i/o on the raspberry pi. Our initial design involved passing audio through multiple queue data structures as processing was performed on it and it was prepared for networking. However, we found that this caused latency issues that resulted in sporadic and choppy audio. With the queueing being too slow, the audio stream now doesn’t go through any intermediary data structures before being networked, and audio processing is isolated on the webserver, as transcript preprocessing.

The other change pertains to the speech to text and speaker identification ML modules inside the transcript generator. We decided to utilize google cloud speaker identification because this would allow us to do all of the ML processing in a single integrated transaction. And this further allowed us to integrate the two modules into a single module. This also decreases the amount of multithreading.

Now we’re looking towards full integration of all components with each other. We’re doing local testing before deploying to AWS, which creates the risk that what works on our own computers doesn’t work once we deploy properly.

Leave a Reply Cancel reply