February 2021 – Team D6: StenoPhone

Mitchell’s Status Report for Feb. 27

This week I setup the server and worked on the design slides outline and design document. I first attempted to deploy the server on Amazon Linux 2, but encountered difficulties after recompiling python for a newer sqlite, so I redeployed the server on ubuntu 18.04. The server is currently in a condition that it can be run, but is empty. We also decided that we wanted to create a custom fitting for our device, so I looked into three options based off of previous projects: woodworking, laser cutting, or 3d printing one. After looking into it, I directly eliminated woodworking as it would be time consuming and requires more manual dexterity. Laser cutting and 3d printing would require CAD work. Laser cutting would allow reuse of old iterations as do not have to scrap the entire housing, while 3d printing would require printing the whole object if iterating. I think that using a mix should work for our project

I filled out the design slides outline and design paper generally, so that the information is there and can refine next week.

In terms of progress, I believe that I’m on schedule. Next week I’ll start testing scipi and audacity for the audio filtering, and continue working on the design document and presentation.

Ellen’s Status Report for Feb. 27

This week I worked on speech to text ML and on design materials. I created a speech-to-text module that implements two different s2t engines – we can choose which one to run, and after our mic arrives test both to find which works better. Unfortunately for me, there was a lot of installation work to be done for both engines. The code itself was less time-consuming to write than the installation and research required to enable it. The engines are Mozilla DeepSpeech and CMU PocketSphinx; both of them have “stream” constructs which allow predictions to be formed from multiple pieces of audio which are input sequentially. I paid a lot of attention to the interface I was creating with this code as I was simultaneously working on the overall software design of the project.

In terms of design materials, I started enumerating the interfaces between the more high-level software modules. I also used extra time that I had after finishing the s2t code to draft the Introduction and Design Requirement sections of our design paper. I’ve volunteered to be the presenter for the design review, so I tried to identify the areas of the presentation we needed to flesh out, and I scripted what I wanted to say in the first half of the presentation.

I feel that I’m on schedule, or maybe slightly ahead. S2t didn’t take the full week so I got ahead on the design materials. For next week, I’ll have finished the transcript-module code that envelops the s2t and speaker identification subsections. Since our team will have finished our presentation outline and slides, I also will have started preparing to deliver the presentation and will have planned the second half of the presentation script.

Team Status Report for Feb.27

This past week, our team has completed the project proposal presentation, our initial proposal was received well, so we have started working more in depth on our design. Ellen has tested the possible ML software for speech to text and speaker ID and has narrowed down our selection to 2 software components that we could use. Mitchell has setup our AWS server, we are currently using the free tier EC2 server to test our server code. Cambrea has started creating the design diagrams for the software that will go on the raspberry pi. We have also ordered a respeaker to start testing the hardware components together.

We are currently on schedule and we don’t have any risks at the moment for our project outside of going off schedule. We haven’t changed the design of our system, we are just working on going more in depth into our design for the review. By Monday we will finish a rough draft of our Design review slides, we are working on gathering the necessary information for this presentation now, in a separate document. Next week we will finish the design review presentation slides, keep running our initial tests of the software and hardware, and continue working on the design report.

Cambrea’s Status Report for Feb. 27

This week I gave the project proposal presentation on Monday, and I ordered a respeaker to start configuring the hardware. I already have a raspberry pi 3 B+ so I spent some time downloading the libraries we will need to the raspberry pi, to test downloading and running code on the raspberry pi. I also started to work on the design for the code that will run Raspberry Pi. The flow of this code will need to handle listening for audio on the microphone, sending this audio in real time to be processed, then transmitting the audio to the AWS server. For this, I found we need to listen for audio on the microphone on its own thread separate from audio processing, we also need to run the server listen and accept on its own thread. I found 2 ways we can do this in python, 1 by using the library thread or 2 by using the library asyncio. I am leaning toward using the asyncio library since it is more helpful in creating io routines and also for networking. One main challenge here is that we will need to tell the respeaker to output audio potentially during the time that it is listening to audio. I will test this functionality on the respeaker when it arrives here. I also researched how we can send the input audio to be processed, we can use a python numpy array or a .wav file, so this format depends on how Mitchell will filter this audio, and what the format needs to be for his script. For the server code, I am planning on using the python socket library and a udp connection. The packets will be compressed audio and the metadata containing the direction of arrival (DOA) of the audio.

My progress is on schedule this week. For Monday I will create a system diagram of the software on the raspberry pi, and continue working on the design presentation slides. Next week I will write the server code that goes on the AWS server and start to test the raspberry pi connection to the server.

Mitchell’s Status Report for Feb. 20

This week my efforts were focused on research and slide preparation. On the research end, I looked into audio filtering methods, aws setup, and methods of testing. I looked into the ReSpeaker, PyAudio, Audacity, and SciPi libraries for methods of audio processing and to see what we could leverage. I also looked for research papers for processing the audio, microphone feedback, and user design. I also looked into methods of testing like stress testing to make our system more robust.

I believe that I am currently on schedule. Our project proposal slides are complete and our initial gantt chart has been created.

From our gantt chart, I will be starting to setup the AWS server and website next week as well as starting to prepare the design presentation.

Cambrea’s Status Report for Feb. 20

This week we worked on creating an outline document with our research. I added my research about hardware, why we are using a raspberry pi, which respeaker we are using and which AWS server we should use. I also added research about how to transmit audio data on the application layer to the AWS server and how to compress the audio for creating packets to send.

From this research document we created the slides and since I am doing the proposal presentation I have been reviewing what I will say. I also talked about each slide with my group.

The most significant risks we could have at this point would be incorrectly laying out our work on the gantt, and not getting the timing right since we are just estimating how long each task will take us. To mitigate this we will be updating the gantt chart as we figure out more about how long each piece of the project will take.

We have worked on designing the system for the most part this week so we don’t have any changes to report. We also have just created the schedule this week which is linked in our team report.

Team Status Report for Feb. 20

This week our team worked on design and planning in order to prepare our project proposal. We researched our requirements and technology solutions, divided work, made presentation slides, and drew up a schedule in the form of a Gantt chart. There are a couple risks that arise from this. First, there’s the risk that, not understanding how much work some aspects of the project might entail, we divided work in an unbalanced way. Here, we just have to be flexible and prepared to change up the division of labor if such issues arise. Second, there’s the risk that our schedule is unrealistic and doesn’t match what will actually happen — but this is counteracted by the nature of the document as something that will be constantly changing over time.

Since we were creating our design this week, we can’t really say that it changed compared to previously; but our ideas were solidified and backed up by the research we did. Some of the requirements we outline in our proposal are different from those in our abstract because of this research. For example, in our abstract we highlighted a mouth-to-ear latency of one second, but after researching voice-over-IP user experience standards, we changed this value to 150ms.

We’ve just finished drawing up our schedule. You can find it below. We’ll point out ways that it changes in subsequent weeks.

Ellen’s Status Report for Feb. 20

This week my efforts were focused on research and on preparing slides for our project proposal. On the research side, I examined a bunch of the requirements we included in our abstract and went digging around the internet for papers and standards documents that could shed light on the specific measurements of a good user experience. This was easier to do for some requirements than others. Machine learning related papers usually focused more on what was possible to achieve with the technology rather than what a user might desire from the technology. But in the end our list of requirements was solidified.

I went on a separate research quest to find viable ML speech to text and speaker diarization solutions and the academic papers associated with the various solutions. Comparing solutions based on metrics reported in papers is an interesting problem; the datasets on which the performance measures are calculated are mostly all different, and there are different performance measures, too (for example, “forgiving” word error rate vs “full” word error rate on some datasets)! My task was basically to search for solutions that did “well” — I might need to evaluate them myself later when we have our hardware.

Currently, I’d say that I’m on-schedule in terms of progress. This comes from the fact that we just came up with our schedule this week! In this next week I’m working on getting an initial version of our speech-to-text up an running. In the end I want to have a module that’ll take in an audio file and output some text, running it through a different ML solution depending on a variable that’s set. Near the end of next week I will also start on the pre-processing for getting audio packets into the correct form to be passed into the speech-to-text module.