Ellen’s Status Report for Feb. 27

This week I worked on speech to text ML and on design materials. I created a speech-to-text module that implements two different s2t engines – we can choose which one to run, and after our mic arrives test both to find which works better. Unfortunately for me, there was a lot of installation work to be done for both engines. The code itself was less time-consuming to write than the installation and research required to enable it. The engines are Mozilla DeepSpeech and CMU PocketSphinx; both of them have “stream” constructs which allow predictions to be formed from multiple pieces of audio which are input sequentially. I paid a lot of attention to the interface I was creating with this code as I was simultaneously working on the overall software design of the project.

In terms of design materials, I started enumerating the interfaces between the more high-level software modules. I also used extra time that I had after finishing the s2t code to draft the Introduction and Design Requirement sections of our design paper. I’ve volunteered to be the presenter for the design review, so I tried to identify the areas of the presentation we needed to flesh out, and I scripted what I wanted to say in the first half of the presentation.

I feel that I’m on schedule, or maybe slightly ahead. S2t didn’t take the full week so I got ahead on the design materials. For next week, I’ll have finished the transcript-module code that envelops the s2t and speaker identification subsections. Since our team will have finished our presentation outline and slides, I also will have started preparing to deliver the presentation and will have planned the second half of the presentation script.

Ellen’s Status Report for Feb. 20

This week my efforts were focused on research and on preparing slides for our project proposal. On the research side, I examined a bunch of the requirements we included in our abstract and went digging around the internet for papers and standards documents that could shed light on the specific measurements of a good user experience. This was easier to do for some requirements than others. Machine learning related papers usually focused more on what was possible to achieve with the technology rather than what a user might desire from the technology. But in the end our list of requirements was solidified.

I went on a separate research quest to find viable ML speech to text and speaker diarization solutions and the academic papers associated with the various solutions. Comparing solutions based on metrics reported in papers is an interesting problem; the datasets on which the performance measures are calculated are mostly all different, and there are different performance measures, too (for example, “forgiving” word error rate vs “full” word error rate on some datasets)! My task was basically to search for solutions that did “well” — I might need to evaluate them myself later when we have our hardware.

Currently, I’d say that I’m on-schedule in terms of progress. This comes from the fact that we just came up with our schedule this week! In this next week I’m working on getting an initial version of our speech-to-text up an running. In the end I want to have a module that’ll take in an audio file and output some text, running it through a different ML solution depending on a variable that’s set. Near the end of next week I will also start on the pre-processing for getting audio packets into the correct form to be passed into the speech-to-text module.