May 2022 – Team D6: EyeHear

This past week I spent some time finalizing my prep for the final presentation, then I moved to focus on working on our poster and planning testing. I picked up our new mics, which we were able to test out on Friday. The new mics take noticeably better recordings than our UMA-8 array.

Before the Friday testing, I wrote up the tests that we should perform, how many times we should perform them, and created a spreadsheet to keep track of our tests and results so that we could get through data collection and analysis more efficiently. I decided on 3 trials per test, as this is a standard minimum number of trials for an experiment.

In our last meeting, Dr. Sullivan gave us some advice as to how to test our system more effectively, and I incorporated that advice into the new testing plan. For one, Larry and I used the same script as each other for this round of testing. In test 7, where we spoke at the same time, I started on sentence 2 and spoke sentence 1 at the end, and Larry started on sentence 1. Initially we tried simply speaking the same script starting from the top, but Larry and I were speaking each word at the same time, so the resulting word error rates would not have been a good measure of our system’s ability to separate different speakers.

Another testing suggestion I added for this round was having a reversed version of each test, so that we could tell if there was any difference between word error rate for my voice (higher pitched) vs. Larry’s voice (lower pitched).

One suggestion that we have not yet used is testing in a wider open space, such as an auditorium or concert hall. For the sake of time (setup and data collection took 3-4hrs), we decided to only get conference room data for now.

We are currently on schedule.

On Sunday, I will complete the poster. Next week, I will generate the appropriate audio files and calculate WER for the following sets of processing steps:

0. Just deep learning (WERs already computed by Charlie)

1. SSF then deep learning, to see if SSF reverb reduction improves the performance of the deep learning algorithm

2. SSF then PDCW, to see if this signal processing approach works well enough for us now that we have better mics

Last week, I spent some time adjusting the caption generation. I wrote a very simple recursive algorithm to have the text move to a new line instead of moving off-screen, though now it can split words in half. I still have to adjust it so that words are not split in half, though that is not a difficult problem.

I also spent time looking over Stella’s final presentation and providing a small amount of feedback.

Another thing I did this week was to integrate the new microphones that we purchased with the scripts that we were running on the Jetson TX2. Surprisingly, we only had to change the sample rates in a few areas. Everything else worked pretty much out of the box. We recorded a lot more data with the new microphones. Here is a good example of a full overlap recording, showcasing both the newer captions and the higher quality microphones:

https://drive.google.com/file/d/1MFlt5AUgrVL5hiOT9XV_zveZVAu-saj3/view?usp=sharing

For comparison, this is a recording from our previous status reports that we made using the microphone array:

https://drive.google.com/file/d/1MmEE7Yh0Kxe5wChuq5n5rHsnGKMOyZYr/view?usp=sharing

The difference is pretty stark.

Currently, we are on schedule for producing an integrated final product. We definitely do not have time for a real-time implementation and we are currently discussing whether we have enough time to create some sort of enclosure for our product. Given how busy the next week will be, I doubt we will be able to do anything substantive.

By next week, we will have completed the final demo. The main items we have to finish are the integration with the website, the data analysis, the final poster, and the final video.

Month: May 2022

Stella’s Status Report 30 April 2022

Larry’s Status Report for 30 April 2022