Team Status Reports – Team D6: StenoPhone

Team Status Report for May 8

This week we finished the final project presentation. We have also been working on the final video, final report, and poster.

We are finished with creating the project and we are just working on the end documentation so we don’t have any significant risks, changes to the design, or changes to the gantt chart.

We are using iMovie to edit our video shown here

We performed our demo last Wednesday and received overall good feedback from Professor Sullivan and Abha. We had some technical difficulties during the beginning of the demo, so we started to add some robustness to the routines. The issue occurred when there were two microphones connected to a meeting and one of the microphones was disconnected. We also worked on testing the AWS deployment as well as formalizing the testing scripts. We also added in the ability to download transcripts of the meeting as pdfs. We have also started working on our final presentation slides.

Since we are performing our tests now, if there are any major issues that require an implementation overhaul, we would have to scramble to accomplish it.

Below is an image of our contraption. It has a Raspberry Pi, microphone, ReSpeaker, and custom ReSpeaker case.

Below is an image of our meeting transcript of our meeting in progress.

Team Report for April 10th

Last week our team had finished our individual components, Cambrea finished audio streaming between raspberry pi and server, Ellen finished speech to text, and speaker identification ML, and Mitchell finished the database and website setup and transcript streaming. This week our team has focused heavily on complete system integration and testing. We have completed the connections between all components and have real transcript streaming to the website of the input audio from users. We are currently working on improving our speaker identification during the setup phase of the meeting.

Our main risk for right now is that the direction of arrival data from the speaker can have a small fluctuation during the time that a speaker changes or during the time that a speaker is speaking. This fluctuation is mainly affecting our setup phase for when we register speakers to be identified by the speaker identification ML. During the setup phase we rely on the DOA to determine if a new speaker should be registered with the system. We are currently fixing and testing this by only saying that a new speaker should be registered if a new DOA is detected and also if there is a significant amount of audio coming from that direction. This will work to ignore small fluctuations of DOA.

Updated Gantt Chart Link

https://docs.google.com/spreadsheets/d/1eeHeut41JF_Ju9Ys14n_sOLiZGYdpl4HtEale2ySasY/edit?usp=sharing

For our updated gantt chart, we have completed the task “Integration of RPi + Transcript + ML” half a week early so we are currently working on the task “Complete System Debugging / Testing”. Our current “complete system” for the demo uses a computer as the intermediary server that handles the speaker ML and transcript streaming. Ellen is working on some ML improvements for integration and testing at this time. After the demo Cambrea and Mitchell will start the migration to using the AWS Server instead of our local computer for the task “AWS Migration”. For our latency testing and standard testing we will complete these during week 12 and 13 after we have migrated to AWS. We will start revising our final report the during week 11, and work on our final presentation and video during weeks 12-14.

Team Status Report for April 3

This week we started to wrap up our individual modular code-writing and looked towards integration and demoing. As we tested and composed some of the most complex sections of the solution, we had to do some problem solving that led to design changes and improvements.

There are two notable design changes. The first is with regards to audio i/o on the raspberry pi. Our initial design involved passing audio through multiple queue data structures as processing was performed on it and it was prepared for networking. However, we found that this caused latency issues that resulted in sporadic and choppy audio. With the queueing being too slow, the audio stream now doesn’t go through any intermediary data structures before being networked, and audio processing is isolated on the webserver, as transcript preprocessing.

The other change pertains to the speech to text and speaker identification ML modules inside the transcript generator. We decided to utilize google cloud speaker identification because this would allow us to do all of the ML processing in a single integrated transaction. And this further allowed us to integrate the two modules into a single module. This also decreases the amount of multithreading.

Now we’re looking towards full integration of all components with each other. We’re doing local testing before deploying to AWS, which creates the risk that what works on our own computers doesn’t work once we deploy properly.

Team Status Report for March 27

This past week, our team continued to work on phases 2 and 3. Some of the members worked on their ethics assignment. Ellen set up the microphone identification backend. Cambrea continued to work on Raspberry Pi code, making sure that it is robust. Mitchell worked on transcript streaming. We also assembled our second device.

In terms of upcoming risks, on Thursday, Ellen and Mitchell are going to be assigned a large project in another class, so it may be a large time commitment, but we do not think it will be a problem. We have not changed changed our design. Schedule change wise, Mitchell is working early on transcript streaming which Cambrea was assigned to.

Team Status Report for March 20

This week the team finished the design report. Ellen and Mitchell worked on setting up the website using the Django framework, and also on connecting to the AWS Server to access the database. Cambrea worked on the raspberry pi system which set up the threads for audio IO, audio processing and audio streaming to server on the raspberry pi.

Our biggest risk now for our project is that our denoising will have a negative impact on the audio stream and negatively affect the clarity of the audio and the transcript generation. We are planning on using the background noise in the room(when there are no voices) to create a noise file that we will use to denoise the signal. Our current approach is to create this file during the meeting using the extraneous noise from the beginning of the meeting. There could be a problem with this approach since the noise in the beginning wouldn’t necessarily represent the noise throughout the meeting. But we are starting with this approach for now and we will test the result. Alternatively we will focus more on filtering the audio to remove noise.

We haven’t changed the design of our project.

A change in the schedule is that the task of mic initialization backend was reassigned to Ellen.

Gantt Chart – combined (1)

Team Status Report for March 13

This week the team members were busily taking care of our individual programming responsibilities. Cambrea was working on networking, Mitchell was working on audio processing and website stuff, and Ellen was working on transcription. We also worked towards completing the design report document that’s due on Wednesday.

The risk we’ve been talking about recently is mismatched interfaces. While we write our separate modules, we have to be aware of what the other members might require from them. We have to discuss the integration of the individual parts and, if we discover that something different is required, we have to be ready to jump in and change the implementation. For example, Ellen made the transcript output a single text file per meeting. However, when Cambrea starts writing the transcript streaming, she might discover that she wants it in a different format; so we just have to recognize that risk and be prepared to modify the code.

Our schedule hasn’t changed besides our making progress through its components.

Team Status Report for March 6

This past week, our team finished up our design review presentation; we also went over the feedback from the project proposal presentation and incorporated it into our project. Cambrea set up the respeaker and created a set of test audio data. Ellen tested the ML software on the test data and eliminated one of the ML options that we considered, and investigated the other option that Abha, our TA, suggested. Mitchell continued working on the website.

Some parts of our schedule have been shuffled around earlier as we got our hardware earlier than expected, but according to our modified schedule, we are still on schedule. We did encounter a risk this week. We spent a time chasing models that are duds and we eliminated it by testing early and finding alternatives. One of the models had a 100% error rate. Design change wise, we also are adding a LED indicator when audio is not being picked up and audio preprocessing for the ML. Next week, we will be finishing up our design report and continue our first phase.

Team Status Report for Feb.27

This past week, our team has completed the project proposal presentation, our initial proposal was received well, so we have started working more in depth on our design. Ellen has tested the possible ML software for speech to text and speaker ID and has narrowed down our selection to 2 software components that we could use. Mitchell has setup our AWS server, we are currently using the free tier EC2 server to test our server code. Cambrea has started creating the design diagrams for the software that will go on the raspberry pi. We have also ordered a respeaker to start testing the hardware components together.

We are currently on schedule and we don’t have any risks at the moment for our project outside of going off schedule. We haven’t changed the design of our system, we are just working on going more in depth into our design for the review. By Monday we will finish a rough draft of our Design review slides, we are working on gathering the necessary information for this presentation now, in a separate document. Next week we will finish the design review presentation slides, keep running our initial tests of the software and hardware, and continue working on the design report.

Team Status Report for Feb. 20

This week our team worked on design and planning in order to prepare our project proposal. We researched our requirements and technology solutions, divided work, made presentation slides, and drew up a schedule in the form of a Gantt chart. There are a couple risks that arise from this. First, there’s the risk that, not understanding how much work some aspects of the project might entail, we divided work in an unbalanced way. Here, we just have to be flexible and prepared to change up the division of labor if such issues arise. Second, there’s the risk that our schedule is unrealistic and doesn’t match what will actually happen — but this is counteracted by the nature of the document as something that will be constantly changing over time.

Since we were creating our design this week, we can’t really say that it changed compared to previously; but our ideas were solidified and backed up by the research we did. Some of the requirements we outline in our proposal are different from those in our abstract because of this research. For example, in our abstract we highlighted a mouth-to-ear latency of one second, but after researching voice-over-IP user experience standards, we changed this value to 150ms.

We’ve just finished drawing up our schedule. You can find it below. We’ll point out ways that it changes in subsequent weeks.

Category: Team Status Reports