Ellen’s Status Reports – Team D6: StenoPhone

Ellen’s Status Report for May 8

There is not much to report this week. Listened to final presentations, made the poster, and filmed more content for the video. Next week we will be setting up demo, doing demo, and submitting final paper!

Ellen’s Status Report for May 1

We did a lot of testing this week. I was responsible for the transcription related tests so I spent time calculating word error rates, speaker ID error rates, and the like. The transcript latency test was fairly time consuming to evaluate as I couldn’t come up with a way to automate it but in the end there were a good number of samples and positive results.

I helped to prepare the final presentation slides this week. I also whipped up a script for the final video, and we got a lot of filming done today.

There’s not much else to report! I’m glad we pushed very hard to do a nice integration before the interim demo, so now we aren’t rushing to finish our MVP. Next week we’ll keep preparing final materials.

Team Status Report for May 1

This week we finished development and both started and ended our testing phase. Testing was conducted on Friday and Monday. This involved developing testing strategies, doing the tests while we were gathered in the same place, and then evaluating the results. We were happy to find that all tests passed based on our requirements!

We also finished preparing our final presentation, filling in the final test results and additionally including links to sample test data. Mitchell practiced giving the presentation to the group.

We started planning our final video this week as well. We outlined what we wanted to include in the video and drafted a script for the voiceover and the footage we want to capture.

At this point, since we’re focused on our final materials, our risks and scheduling constraints are passing into the rear view mirror. We’re on schedule with our work. Our main risk is that the second covid vaccine dose incapacitates two of our members who are getting it next week.

Ellen’s Status Report for April 24

We’ve made a lot of progress the last two weeks. Our demo last Wednesday went well, and afterwards because of the feedback we received we decided to revamp the speaker id setup process to make it more user-understandable. I worked on doing that all weekend, and then we tested it a bit on Monday while testing the AWS deployment.

Then Tuesday and Thursday I worked on outlining and fleshing out our final presentation slides. I also came up with a potential method for transcript latency testing.

Since we’ve already moved on to validating requirements (with a high confidence that they’ll pass) I’d say we’re on schedule according to our Gantt chart. This week we’ll finish gathering and evaluating test data. I’ll probably be calculating a lot of transcript related error rates. We’ll also prepare our final presentation and work on our video.

Ellen’s Status Report for April 10

This week has been a busy one for our team! Over the weekend I wrote audio-accumulation for the UDP side of the webserver and accumulated raw audio instead of files. Then from Monday forward our focus was on integration. After every time we met I had a bunch of todo items to debug or just improve the server processing and transcript generator. We met Mon., Wed., Fri., and Sat. for multiple hours each time. My work has included small (but important) tweaks like an improved filenaming system, improved audio accumulation, improved error reporting, changing what does and does not count as a “speaker change” in the transcript and in the backend, and others I can’t recall at this point. Larger changes I made would be changing the way speech to text results are accepted from Google (only accepting “final” and not interim results) and adding more branches and states to the state machine of microphone setup.

From now on my schedule is the same as the group’s schedule as a whole. We think we have a demo prepared at this point but when we meet tomorrow we’ll do more testing on the demo we’re preparing as well as testing how two mics in the same meeting work together. By the weekend we’ll be done integrating and debugging our two-mic setup and we’ll have some idea of how well the transcript is matching our requirements/targets.

Ellen’s Status Report for April 3

This week I worked intensively on the speaker identification subsystem. If you’ve already read the team status report you’ll know that we decided to ditch the speaker diarization packages we had previously identified as possible solutions – crucially, none of the packages provided audio streaming abstractions – and proceed with google cloud speaker diarization, which could be added to the current google speech to text request by simply tweaking a configuration variable. This created the opportunity to integrate the speech to text and speaker ID modules, so I had a lot of code to write and rewrite to both get the speaker ID module integrated and to add in the processing the way we desired it (including the DOA augmentation). Initial tests (using random vocal samples I found online) suggest to me that the system is going to work well and that DOA augmentation is actually going to be quite valuable – but I conducted this testing just yesterday, so the jury is still out, I suppose.

At this point we’re sort of reaching the point of schedule ambiguity. I would consider speaker identification to be finished as of yesterday, which is a good half-a-week early. I don’t have any more individually-assigned coding tasks. Now I’ll be supporting integration through debugging and revision. As for what I know I want to accomplish this upcoming week: I want to rewrite the network-to-transcription queueing interface to accumulate multiple packets (if available) before sending the audio off to transcription. I also want to help Mitchell get transcript streaming fully working.

Ellen’s Status Report for March 27

This week I continued moving forward on my tasks in the order I assigned them to myself. First, I finished the ethics assignment over the weekend. Also over the weekend, I finished adding functionality to the buttons and forms on the website.

Then, over the week, I worked on the mic setup process. When a mic first joins a meeting, it’s in setup mode – audio is used to assign names to speakers. In this process the system compiles names, initial locations, and vocal samples for all speakers. Then when the web user says the process is done, audio starts getting routed to the actual transcript file. I finished coding up this process in a way that meshes with the current speaker id system (non-ML) but will be easy to use in the ML system as well.

I’m a few days ahead of schedule. I was supposed to start speaker ID ML on Monday, but I’ll start it on Saturday. Speaker ID ML is my last solo task.

Speaker ID ML doesn’t have to be finished by the end of next week, but it should be most of the way there; I should have at least one option working. The other thing I’m going to do is rework my data storage so that everything lives inside the database. Hopefully that’ll be done by later today.

Ellen’s Status Report for March 20

I worked on a couple different things this week. Each day up to Wednesday we were making tweaks and additions to the design report paper, and then submitting it on Wed.

I finished the (initial) meeting management module. After a fistfight with python import statements I was able to interface it properly with the Django models that represent microphone objects.

I also wrote a script intended to be run after the django server starts, that will start threads for Cambrea’s udp server and my transcript generator. The two will communicate with a queue object. New threads will break off to “serve” every item the udp server puts in the queue.

Then for the rest of the week I worked on setting up the web pages for getting around the website. I configured the urls, wrote the html, and made the Django “views” (controlling what http response gets sent and parsing POSTed forms and that sort of thing). I did that for a bunch of pages but there are still a couple of pages that need to be added.

I’m on-track in terms of scheduling. I was supposed to start the web page stuff today (Friday) but I started on Wednesday.

None of my current tasks have due dates before the next status report. Nevertheless, I hope to have all my web pages set up with buttons/forms that perform the intended effect on the backend as well. I will also start the mic initialization backend work, which is a task I took over from Mitchell since it’s so heavily connected to what I’ve already done. And I’ll also be doing the ethics assignment this upcoming week.

Ellen’s Status Report for March 13

This was a pretty productive week on my end. Over the weekend, I got Google speech-to-text working (made an account, got credentials, added in the code, etc) to great success! It just seems way more accurate than the other two options I had implemented originally. (This is based on the same little paragraph snippet Cambrea recorded on the respeaker for some initial testing.)

Also over the weekend (if I’m recalling correctly) I coded up our first version of speaker identification (the no-ML, no-moving version). At that point it was gratifying to see simulated transcript results with both speaker tags and voice-to-text!

And my final weekend task was preparing for the design presentation, which I delivered on Monday.

Speaking of design materials, I worked a lot on the design report document. Since I’m the group member who likes writing the most, I zipped through first drafts of a bunch of the sections which the others are going to proofread and modify for the final version. And in the trade-studies and system-description sections, I just wrote the technical bits that I was responsible for. It’s nice having this document pretty close to finished!

Finally, I started the meeting management module. This takes a new transcript update and actually updates the file corresponding to the correct meeting. I’ve finished most of it, except for the bits that interface with the database – I had to confer with the rest of the team about that.

In terms of the schedule, I’m kind of on-track, kind of ahead of schedule. I’m on track for writing the meeting manager (self-assigned due date of Monday), but as for my schedule item after that, “transcript support for multi-mic meeting,” I’ve actually been building that into the transcription the entire time, so it looks like I’ll be able to start my actual next task earlier than scheduled.

Next week I’m scheduled to deliver the meeting management module. The on-website meeting setup flow, which is my next responsibility, will also be partially completed.

Ellen’s Status Report for March 6

This week I did a real mishmash of stuff for the project. I finished the design presentation slides as well as preparing my remarks for the design presentation, which I demoed for the team to get feedback.

I finished coding the transcript generator sans speaker ID — this included a multithreaded section (so I had to learn about python threads), audio preprocessing (downsampling to the correct rate and lowpass filtering), as well as going back through my previous code and making the data structure accesses threadsafe.

Since we received the respeaker mic in the mail, Cambrea recorded some audio clips on it and sent them to me so I could test the two speech to text models I had implemented in the transcript generator. The performance of DeepSpeech was okay – it made a lot of spelling errors, and sometimes it was easy to tell what the word actually ought to have been, sometimes not so easy. (If we decide to go with DS S2T, maybe a spelling-correction postprocessing system could help us achieve better results!) CMU PocketSphinx’s output was pretty much gibberish, unfortunately. While DS’s approach was to emulate the sounds it heard, PS tried to basically map every syllable to an English word, which didn’t work out in our favor. Since PS is basically ruled out, I’m going to try to add Google cloud speech to text to the transcript generator. The setup is going to be a bit tricky because it’ll require setting up credentials.

So far I haven’t fallen behind where my progress is supposed to be, but what’s interesting is that some past tasks (like integrating speech to text models) aren’t actually in the rearview mirror but require ongoing development as we try certain tests. I kind of anticipated this, though, and I think I have enough slack time built into my personal task schedule to handle this looking-backwards as well as working-forwards.

This week my new task is an initial version of speaker ID. This one does not use ML, does not know the number or identity of speakers, and assumes speakers do not move. Later it’ll become the basis of the direction-of-arrival augmentation of the speaker ID ML. I’m also giving the design presentation this week and working more on the design report. And by the end of next week, Google s2t integration doesn’t have to be totally done but I can’t let the task sit still either; I’ll have made some progress on it by the next status report.