This week was dedicated to testing the audio segmentation code that I created last week. From here, I realized that I would need to change the peak difference threshold, or how steep a change must be to be considered a new note, as well as the max time difference, how long it must take for a note to be considered a new note, change with the BPM and audio quality. After testing with a random audio sample from youtube of Ten Little Monkeys, we realized that this audio was probably not that useable, at least for early stage testing, as the eighth notes were note distinct enough for the code, and quite honestly myself manually, to identify.
In this image, using the audio recording of ten little monkeys, I have manually identified where the notes changes using the green line. You can see here that the code (in the red lines) are not correctly identifying these notes. However, the change is not that significant either (less than a 0.3 RMS difference) and doesn’t approach zero as much as other audios do, like twinkle twinkle little star. To try and fix this, I experimented with different audio amplifications – first just scaling the signal, but this resulted in the “near zero” value also getting scaled, meaning it still wouldn’t pick up the note, though it had a significant difference now. Then I tried to scale it exponentially, to keep the near zero values near zero and the other values increase, but since it is fairly quiet overall, this meant that everything was near zero and didn’t have a significant difference. This led us to first experiment with clearer cut audios before coming back to this one when the program is more advanced.
My next steps would be finding more distinct/staccato recordings of faster notes and seeing how the audio segmentation handles that and polish off my rhythm detection code.