Mohini’s Status Report for 10/16/2020 – iRecruit

This week, I primarily focused on the signal processing aspect of our project. Last week involved saving the audio file that the user records as an integer vector and recognizing that the time domain signal was not a sufficient approach to categorizing signals as different representations of the same letter resulted in signals with similar shapes but different amplitudes. Therefore, this week, it led to the idea of analyzing the signal in the frequency domain. After taking the Fourier Transform of the time domain signal, we realized that this was also not a sufficient approach as the Fourier Transform of every letter had a peak at the low frequencies and another peak at the higher frequencies. After doing a little more research, we decided to analyze the Short Time Fourier Transform (STFT) over 20 ms chunks of the audio clip. This was plotted on a spectrogram, and it was easier to determine similarities between same letters and differences between different letters.

The team and I spent a good amount of time trying to understand why this was the case and how to proceed. We met with a PhD student, who specializes in speech processing, to get some guidance. He told us to use a Hamming window with 50% overlap instead of a rectangular window with no overlap (which we had previously been using) when determining the STFT. Additionally, he told us to look into log mel filterbanks which will scale the frequency values to perception values that human ears are used to. We plan to implement these two features in the upcoming week. I believe my work is somewhat on schedule as determining the signal processing output is a crucial part of our project that we allocated several weeks to implement.

Leave a Reply Cancel reply