This week, I worked with Mohini on the signal processing part. We needed to research and experiment with different ways to trim our audio and scale our x-axis to make all the final outputs the same length. We decided to take a different approach and analyze the Short Term Fourier Transform (STFT) over 20 millisecond chunks of the whole audio file. After splitting the audio file and applying the fourier transform to each chunk, we plotted the results on a spectrogram. Unlike before, we were able to see slight similarities when we said the same letter multiple times and differences between the different letters. We additionally met with a PhD student who specializes in speech recognition. He gave us tips on how to further hone our input. For example, he recommended we use a Hamming window with a 50% overlap and scale the frequency values so the numbers aren’t too small.
I believe I am still on schedule. The goal last week was to have an output ready so we could use it as the input for the neural network. Though the output needs more modifications, we were able to come up with a solution. This week, I hope to continue my work in the signal processing portion and add all the modifications that were recommended by the PhD students and solidify the output of the signal processing algorithm.