This week, I built off of the signal processing work we did in the previous weeks to create the output of the signal processing algorithm. The process after reading the original input file is as follows:
- We first apply a pre-emphasis on the audio input:
- To do this, we use the equation y(t) = x(t) – alpha*x(t-1). The alpha value is a predetermined filter coefficient which is usually 0.95 or 0.97.
- By doing so, we will be able to improve the signal to noise ratio by amplifying the signal.
- We then frame the updated signal:
- Framing is useful because a signal is constantly changing over time. Doing a simple Fourier transform over the whole signal because we would lose the variations through time.
- Thus, by taking the Fourier transform of adjacent frames with overlap, we will preserve as much of the original signal as possible.
- We are using 20 millisecond frames with 10 millisecond frames.
- With the updated signal, we use a hamming window:
- A Hamming window reduces the effects of leakage that occurs when performing a Fourier transform on the data.
- To apply it, we use a simple line of code in python.
- Fourier Transform and Power Spectrum:
- We can now do the Fourier Transform on the data and compute the power spectrum to be able to distinguish different audio data from each other.
The output will continue to be modified and enhanced to make our algorithm better, but we have something to input into our neural network now. I began looking into filter banks and mfcc, which are two techniques used to change the data so it is more understandable to the human ear. I will continue this next week and if time allows help the team with the neural network algorithm.