This week I spent time transitioning the audio component from running off the google board to running off of a Dell XPS 15 laptop. This decision was made as we decided to move away from creating a ML model from scratch and instead use a heuristic for the eye-tracking data. Transitioning to a laptop had some unforeseen difficulties as the microphone was not producing any audio and code written on the google board was not crashing on the laptop. After debugging, the code now runs identically on the laptop as it did on the google board. However, this time spent debugging hindered progress on improving the audio processing. Some progress has been made in segmenting the audio into chunks to later take the fourier transform and compare to the MIDI. However, the robustness of this process is still unknown. We forsee that few errors in note prediction can be handled, but a large number of errors will lead to the whole system being unreliable. This is caused because the method we use to detect sound is not a transducer placed directly on the instrument but rather a microphone which is susceptible to picking up noise, reflections, and harmonics.
This upcoming week, I will be looking to start taking the chunks from the microphone and running them through a Fast Fourier Transform (FFT) and seeing how accurately the program can predict the notes being played. We have already run a C major scale through the FFT and only found the highest C was not being predicted correctly. We believe this one error is manageable but have not tested it under many other conditions.
In terms of testing, the C major scale will continue to act as the baseline mark for testing the quality of notes extracted from a signal. Currently, the Fourier transform is predicting the upper harmonics to be large in amplitude for certain notes, causing some errors in matching to the MIDI. The MIDI acts as ground truth and will be used to analyze how notes are extracted. I will note that this assumes the musician makes no mistakes, which is possible for simple sequences we will be testing on. Another test for the audio portion is to test that a signal is always sent to the front end when the music reaches the end of the page. This will be tested by having a musician play into the microphone and monitoring the front end if the page turns solely based on the audio.
We are currently still on track and will continue to work hard to stay on track.