Uncategorized – Team D6: StenoPhone

Updated Gantt Chart Link

For our updated gantt chart, we have completed the task “Integration of RPi + Transcript + ML” half a week early so we are currently working on the task “Complete System Debugging / Testing”. Our current “complete system” for the demo uses a computer as the intermediary server that handles the speaker ML and transcript streaming. Ellen is working on some ML improvements for integration and testing at this time. After the demo Cambrea and Mitchell will start the migration to using the AWS Server instead of our local computer for the task “AWS Migration”. For our latency testing and standard testing we will complete these during week 12 and 13 after we have migrated to AWS. We will start revising our final report the during week 11, and work on our final presentation and video during weeks 12-14.

Ellen’s Status Report for April 3

This week I worked intensively on the speaker identification subsystem. If you’ve already read the team status report you’ll know that we decided to ditch the speaker diarization packages we had previously identified as possible solutions – crucially, none of the packages provided audio streaming abstractions – and proceed with google cloud speaker diarization, which could be added to the current google speech to text request by simply tweaking a configuration variable. This created the opportunity to integrate the speech to text and speaker ID modules, so I had a lot of code to write and rewrite to both get the speaker ID module integrated and to add in the processing the way we desired it (including the DOA augmentation). Initial tests (using random vocal samples I found online) suggest to me that the system is going to work well and that DOA augmentation is actually going to be quite valuable – but I conducted this testing just yesterday, so the jury is still out, I suppose.

At this point we’re sort of reaching the point of schedule ambiguity. I would consider speaker identification to be finished as of yesterday, which is a good half-a-week early. I don’t have any more individually-assigned coding tasks. Now I’ll be supporting integration through debugging and revision. As for what I know I want to accomplish this upcoming week: I want to rewrite the network-to-transcription queueing interface to accumulate multiple packets (if available) before sending the audio off to transcription. I also want to help Mitchell get transcript streaming fully working.

Ellen’s Status Report for March 27

This week I continued moving forward on my tasks in the order I assigned them to myself. First, I finished the ethics assignment over the weekend. Also over the weekend, I finished adding functionality to the buttons and forms on the website.

Then, over the week, I worked on the mic setup process. When a mic first joins a meeting, it’s in setup mode – audio is used to assign names to speakers. In this process the system compiles names, initial locations, and vocal samples for all speakers. Then when the web user says the process is done, audio starts getting routed to the actual transcript file. I finished coding up this process in a way that meshes with the current speaker id system (non-ML) but will be easy to use in the ML system as well.

I’m a few days ahead of schedule. I was supposed to start speaker ID ML on Monday, but I’ll start it on Saturday. Speaker ID ML is my last solo task.

Speaker ID ML doesn’t have to be finished by the end of next week, but it should be most of the way there; I should have at least one option working. The other thing I’m going to do is rework my data storage so that everything lives inside the database. Hopefully that’ll be done by later today.

Category: Uncategorized

Final Video and Poster Available!

Updated Gantt Chart

Ellen’s Status Report for April 3

Ellen’s Status Report for March 27