Team D6: StenoPhone – Page 4 – Carnegie Mellon ECE Capstone, Spring 2021. Cambrea Earley, Ellen Seeser, and Mitchell Yang

Team Status Report for March 13

This week the team members were busily taking care of our individual programming responsibilities. Cambrea was working on networking, Mitchell was working on audio processing and website stuff, and Ellen was working on transcription. We also worked towards completing the design report document that’s due on Wednesday.

The risk we’ve been talking about recently is mismatched interfaces. While we write our separate modules, we have to be aware of what the other members might require from them. We have to discuss the integration of the individual parts and, if we discover that something different is required, we have to be ready to jump in and change the implementation. For example, Ellen made the transcript output a single text file per meeting. However, when Cambrea starts writing the transcript streaming, she might discover that she wants it in a different format; so we just have to recognize that risk and be prepared to modify the code.

Our schedule hasn’t changed besides our making progress through its components.

Ellen’s Status Report for March 13

This was a pretty productive week on my end. Over the weekend, I got Google speech-to-text working (made an account, got credentials, added in the code, etc) to great success! It just seems way more accurate than the other two options I had implemented originally. (This is based on the same little paragraph snippet Cambrea recorded on the respeaker for some initial testing.)

Also over the weekend (if I’m recalling correctly) I coded up our first version of speaker identification (the no-ML, no-moving version). At that point it was gratifying to see simulated transcript results with both speaker tags and voice-to-text!

And my final weekend task was preparing for the design presentation, which I delivered on Monday.

Speaking of design materials, I worked a lot on the design report document. Since I’m the group member who likes writing the most, I zipped through first drafts of a bunch of the sections which the others are going to proofread and modify for the final version. And in the trade-studies and system-description sections, I just wrote the technical bits that I was responsible for. It’s nice having this document pretty close to finished!

Finally, I started the meeting management module. This takes a new transcript update and actually updates the file corresponding to the correct meeting. I’ve finished most of it, except for the bits that interface with the database – I had to confer with the rest of the team about that.

In terms of the schedule, I’m kind of on-track, kind of ahead of schedule. I’m on track for writing the meeting manager (self-assigned due date of Monday), but as for my schedule item after that, “transcript support for multi-mic meeting,” I’ve actually been building that into the transcription the entire time, so it looks like I’ll be able to start my actual next task earlier than scheduled.

Next week I’m scheduled to deliver the meeting management module. The on-website meeting setup flow, which is my next responsibility, will also be partially completed.

Mitchell’s Status Report for March 6

This week I worked on the website development as well as the design review. I finished setting up the website backend so that it is able to integrate with other components like the transcript. I also worked on the design paper on reducing the mass of information that was put there

Progress-wise, I am on track for the web end, but have not started the audio filter for the audio on the raspberry pi. I might have put a bit too much work simultaneously for this week. The tasks still have additional time on them, but just did not start on the desired time. Next week, I will work on finishing the design paper, starting the filtering, and website development.

Cambrea’s Status Report for March 6

This week I started finished the design for the network connection between the raspberry pi and the AWS server. We are still planning on using a UDP connection here to transmit packets, the packets will be a struct of the micID, audio bytes and direction of arrival information for the audio. The micID is assign to a single rapsberry pi audio device from the server, when the connection is first made. The AWS Server will receive information that the audio device wants to connect and will return the mic ID, this will be resent until the raspberry pi audio device receives the micID or the connection times out. On the server both listen and send will need to run on separate threads to make sure that the server is always listening. There will also be a separate thread for each audio device connection to the server.

I set up a simple server code to test the connection between the aws server and the raspberry pi. Since the AWS server is remote it is in a different wifi network so I had to set up port forwarding, so that the raspberry pi can access the server code on the AWS Server. AWS has its own way for users to setup port forwarding, using AWS Systems manager, I have been following this tutorial https://aws.amazon.com/blogs/mt/amazon-ec2-instance-port-forwarding-with-aws-systems-manager/.

We also received the AWS credits so we upgraded our ec2 instance to m5a.large, we are tracking our usage of the server.

I am currently on schedule, In the next week I am going to complete the code to handle the networking from raspberry pi to AWS Server, this code will include the initial raspberry pi audio device handshake with the AWS server, and the transmission of our packets.

Team Status Report for March 6

This past week, our team finished up our design review presentation; we also went over the feedback from the project proposal presentation and incorporated it into our project. Cambrea set up the respeaker and created a set of test audio data. Ellen tested the ML software on the test data and eliminated one of the ML options that we considered, and investigated the other option that Abha, our TA, suggested. Mitchell continued working on the website.

Some parts of our schedule have been shuffled around earlier as we got our hardware earlier than expected, but according to our modified schedule, we are still on schedule. We did encounter a risk this week. We spent a time chasing models that are duds and we eliminated it by testing early and finding alternatives. One of the models had a 100% error rate. Design change wise, we also are adding a LED indicator when audio is not being picked up and audio preprocessing for the ML. Next week, we will be finishing up our design report and continue our first phase.

Ellen’s Status Report for March 6

This week I did a real mishmash of stuff for the project. I finished the design presentation slides as well as preparing my remarks for the design presentation, which I demoed for the team to get feedback.

I finished coding the transcript generator sans speaker ID — this included a multithreaded section (so I had to learn about python threads), audio preprocessing (downsampling to the correct rate and lowpass filtering), as well as going back through my previous code and making the data structure accesses threadsafe.

Since we received the respeaker mic in the mail, Cambrea recorded some audio clips on it and sent them to me so I could test the two speech to text models I had implemented in the transcript generator. The performance of DeepSpeech was okay – it made a lot of spelling errors, and sometimes it was easy to tell what the word actually ought to have been, sometimes not so easy. (If we decide to go with DS S2T, maybe a spelling-correction postprocessing system could help us achieve better results!) CMU PocketSphinx’s output was pretty much gibberish, unfortunately. While DS’s approach was to emulate the sounds it heard, PS tried to basically map every syllable to an English word, which didn’t work out in our favor. Since PS is basically ruled out, I’m going to try to add Google cloud speech to text to the transcript generator. The setup is going to be a bit tricky because it’ll require setting up credentials.

So far I haven’t fallen behind where my progress is supposed to be, but what’s interesting is that some past tasks (like integrating speech to text models) aren’t actually in the rearview mirror but require ongoing development as we try certain tests. I kind of anticipated this, though, and I think I have enough slack time built into my personal task schedule to handle this looking-backwards as well as working-forwards.

This week my new task is an initial version of speaker ID. This one does not use ML, does not know the number or identity of speakers, and assumes speakers do not move. Later it’ll become the basis of the direction-of-arrival augmentation of the speaker ID ML. I’m also giving the design presentation this week and working more on the design report. And by the end of next week, Google s2t integration doesn’t have to be totally done but I can’t let the task sit still either; I’ll have made some progress on it by the next status report.

Mitchell’s Status Report for Feb. 27

This week I setup the server and worked on the design slides outline and design document. I first attempted to deploy the server on Amazon Linux 2, but encountered difficulties after recompiling python for a newer sqlite, so I redeployed the server on ubuntu 18.04. The server is currently in a condition that it can be run, but is empty. We also decided that we wanted to create a custom fitting for our device, so I looked into three options based off of previous projects: woodworking, laser cutting, or 3d printing one. After looking into it, I directly eliminated woodworking as it would be time consuming and requires more manual dexterity. Laser cutting and 3d printing would require CAD work. Laser cutting would allow reuse of old iterations as do not have to scrap the entire housing, while 3d printing would require printing the whole object if iterating. I think that using a mix should work for our project

I filled out the design slides outline and design paper generally, so that the information is there and can refine next week.

In terms of progress, I believe that I’m on schedule. Next week I’ll start testing scipi and audacity for the audio filtering, and continue working on the design document and presentation.

Ellen’s Status Report for Feb. 27

This week I worked on speech to text ML and on design materials. I created a speech-to-text module that implements two different s2t engines – we can choose which one to run, and after our mic arrives test both to find which works better. Unfortunately for me, there was a lot of installation work to be done for both engines. The code itself was less time-consuming to write than the installation and research required to enable it. The engines are Mozilla DeepSpeech and CMU PocketSphinx; both of them have “stream” constructs which allow predictions to be formed from multiple pieces of audio which are input sequentially. I paid a lot of attention to the interface I was creating with this code as I was simultaneously working on the overall software design of the project.

In terms of design materials, I started enumerating the interfaces between the more high-level software modules. I also used extra time that I had after finishing the s2t code to draft the Introduction and Design Requirement sections of our design paper. I’ve volunteered to be the presenter for the design review, so I tried to identify the areas of the presentation we needed to flesh out, and I scripted what I wanted to say in the first half of the presentation.

I feel that I’m on schedule, or maybe slightly ahead. S2t didn’t take the full week so I got ahead on the design materials. For next week, I’ll have finished the transcript-module code that envelops the s2t and speaker identification subsections. Since our team will have finished our presentation outline and slides, I also will have started preparing to deliver the presentation and will have planned the second half of the presentation script.

Team Status Report for Feb.27

This past week, our team has completed the project proposal presentation, our initial proposal was received well, so we have started working more in depth on our design. Ellen has tested the possible ML software for speech to text and speaker ID and has narrowed down our selection to 2 software components that we could use. Mitchell has setup our AWS server, we are currently using the free tier EC2 server to test our server code. Cambrea has started creating the design diagrams for the software that will go on the raspberry pi. We have also ordered a respeaker to start testing the hardware components together.

We are currently on schedule and we don’t have any risks at the moment for our project outside of going off schedule. We haven’t changed the design of our system, we are just working on going more in depth into our design for the review. By Monday we will finish a rough draft of our Design review slides, we are working on gathering the necessary information for this presentation now, in a separate document. Next week we will finish the design review presentation slides, keep running our initial tests of the software and hardware, and continue working on the design report.

Cambrea’s Status Report for Feb. 27

This week I gave the project proposal presentation on Monday, and I ordered a respeaker to start configuring the hardware. I already have a raspberry pi 3 B+ so I spent some time downloading the libraries we will need to the raspberry pi, to test downloading and running code on the raspberry pi. I also started to work on the design for the code that will run Raspberry Pi. The flow of this code will need to handle listening for audio on the microphone, sending this audio in real time to be processed, then transmitting the audio to the AWS server. For this, I found we need to listen for audio on the microphone on its own thread separate from audio processing, we also need to run the server listen and accept on its own thread. I found 2 ways we can do this in python, 1 by using the library thread or 2 by using the library asyncio. I am leaning toward using the asyncio library since it is more helpful in creating io routines and also for networking. One main challenge here is that we will need to tell the respeaker to output audio potentially during the time that it is listening to audio. I will test this functionality on the respeaker when it arrives here. I also researched how we can send the input audio to be processed, we can use a python numpy array or a .wav file, so this format depends on how Mitchell will filter this audio, and what the format needs to be for his script. For the server code, I am planning on using the python socket library and a udp connection. The packets will be compressed audio and the metadata containing the direction of arrival (DOA) of the audio.

My progress is on schedule this week. For Monday I will create a system diagram of the software on the raspberry pi, and continue working on the design presentation slides. Next week I will write the server code that goes on the AWS server and start to test the raspberry pi connection to the server.