sgetz – Page 2 – Team D6: EyeHear

Now that we have our mic array and webcam, we can start collecting data. So, this week I worked on creating a testing plan so that we can make a list of what data we need to collect and what parts of the system we need working to collect it. This testing document includes a list of repeatable steps for measuring the accuracy of our speech-to-text output, as well as a list of parameters we can vary between tests (e.g., script content, speakers speaking separately vs. simultaneously, speaker positioning relative to each other and to our device, frequency range of the speech). Many of these parameters I took from my notes on the feedback Dr. Sullivan gave us on our system design when we met with him on Wednesday.

For the sake of collecting test data I also found some videos on YouTube of people speaking basic English phrases (mostly these videos are made for people learning English). We’ll have to test to see if our speech-to-text pipeline performs any differently on live speaking vs. on recorded speaking, but if live vs. recorded doesn’t make a difference, we could use these videos to allow one person to collect 2-speaker test data on their own (which could be easier for us, logistically).

I also helped edit Larry’s Design Review slides. Specifically I changed the formatting of parts of our block diagrams to make them easier to understand, and for our testing plan I combined our capture-to-display delay tests for video and for captions into a single test, since we decided this week to add captions to our video before showing it to the user.

I think we are currently mostly on schedule. Our main goal this week was to hammer out the details of our design for the design presentation. As a team, we went through all of the components of our system and decided how we want to transfer data between them, which was a big gap in our design before this week. We had initially planned to have an initial beamforming algorithm completed by this week, however in our meeting on Wednesday we decided, based on Dr. Sullivan’s feedback, to try and use a linear rather than circular mic array (which affects the beamforming algorithm). Charlie and I will work this week on finishing a beamforming algorithm that we can start running test data through and improving.

In the next week, I plan to finish writing a testing plan for our audio data collection, so that we can log what types of data we need and what we’ve collected, and so we can collect data in a repeatable manner. Charlie and I will collect more audio data and work on the beamforming algorithm so that we can test an iteration of the speech-to-text pipeline this or next week. I also plan to work on our written design report and hopefully have a draft of that done by the end of the week.

In the next week I’ll also try to find a linear mic array that can connect to our Jetson TX2, since we haven’t found one yet.

In the first half of this week, I helped Charlie and Larry put together our proposal presentation and helped Charlie edit his script for the presentation. In particular, I worked on defining our stakeholder (use case) requirements and on our solution approach diagram and testing plans. I took notes during Charlie’s presentation on questions that came up for me and from other students and used these notes to figure out what questions to ask Dr. Sullivan after our presentation on Wednesday. After the presentation on Wednesday, our team met up to discuss our next steps. We decided that Charlie and I would decide what microphones we wanted to order and would start on our beamforming algorithm.

Using the advice we got from Dr. Sullivan, I searched for possible microphone array options (including pre-built arrays and parts we could use to build our own). The most important criteria I was looking for were: 1) that we could access the data from each audio channel (one channel per mic) to process them on a computer, and 2) for pre-built arrays, that the mics were spaced far enough apart to distinguish two speakers. Charlie and I met up to discuss mic options and decided on a pre-built circular mic array that can connect to the Jetson we’re using.

In order to start working on the beamforming algorithm, I reviewed the beamforming material from the end of 18-792 and started to look into existing MATLAB code for beamforming. Charlie and I met up to discuss the beamforming algorithm. Here we decided to go with a circular mic array rather than a linear one, so in the coming week we will be figuring out the math for circular mic array beamforming (we’ve only learned linear so far). By the end of the week I aim to have at least an outline of our entire algorithm so that we can start feeding in mic data as soon as we get the mic array. I will also be looking for individual mics we can use to set up our own array, as a backup plan in case we have issues using a pre-built array.

We have a list of all parts we want to order now, and we have clear next steps for developing our beamforming algorithm, so I think we’re on track according to our schedule.

Author: sgetz

Stella’s Status Report for 19 February 2022

Stella’s Status Report for 12 Feb 2022