This week, we made significant progress on the web app setup, audio segmentation, and pitch detection components of our project. We also received our microphone, and Professor Sullivan lended us an audio interface that we can use to record some audio.
Below is an image of what our web app currently looks like. Here, a user can upload flute audio and a recording of their background. They can also adjust the tempo of the metronome (at least for MVP, we are not performing tempo detection, and the user needs to set their tempo/metronome).
Additionally, we now have a basic implementation of audio segmentation (using RMS) working. Below is a graph showing a flute signal of Twinkle Twinkle Little Star, where the red lines mark the start of a new note as detected by our algorithm, and the blue dotted lines represent the actual note onset. Our algorithm’s detected notes were within 0.1ms of the actual note onset.
We achieved similar results with Ten Little Monkeys at regular and 2x speed, though we still need to add a way to dynamically adjust the RMS threshold based on the signal’s max amplitude, rather than using trial and error.
We also started performing pitch detection. To do so, we are using comb filtering and Fourier transforms to analyze the frequencies present in the played note. We then use the fundamental frequency to determine the corresponding note. We were able to successfully determine the MIDI notes for Twinkle Twinkle and plan to continue testing this out on more audio samples.
We are on schedule with our progress currently. For the upcoming week, we plan to integrate all of our existing code together and test/refine the audio segmentation and pitch detection to ensure that it is more robust to various tempos, rhythms, and frequencies. We are also soliciting the SOM flutists’ availability so that we can start some initial testing the week of March 24th. Additionally, after speaking with Professor Chang last week during lab, we have decided to build in some time to add a feature in which users can edit the generated music score (i.e., move measures around, adjust notes, add notation such as trills/crescendos/etc. and more).