This week I continued working on the signal processing algorithm that will generate an input to the neural network. As a team, we have decided to make one significant change to our signal processing algorithm. Instead of trying to recognize individual letters, we will be trying to recognize entire words. Essentially, this reduces the scope of our project, because we will be giving the user a list of 10-15 categories to choose a technical question from. This means that our neural network will have 10-15 outputs instead of the original 26 outputs. Additionally, we will only need to run the neural network algorithm once for each word, rather than once for each letter, which will greatly speed up our time complexity for generating a technical question.
Continuing on my work from last week, after making this decision, I tested the rough signal processing algorithm I created last week on these entire words (“array”, “linked list”, etc). I saw that there were significant differences between different words and enough similarity between the same words. Afterwards, I improved the algorithm by using a Hamming window, rather than a rectangular window as this windowing technique reduces the impact of discontinuities present in the original signal. I also started researching the Mel scale and the Mel filterbank implementation. This will simplify the dimension of the signal processing output, so that it will be easier for the neural network to process without losing any crucial information present in the original signal. Next week, I will be focusing on transforming the output using the Mel scale as well as creating a first attempt at a training dataset for the neural network. This will most likely include 10-15 signals representing each word that our neural network will be categorizing. It is important that our training dataset consists of a variety of signals for each word in order to prevent the model from overfitting.