This week I finalizing the audio alignment algorithm and integrating it into the system. The audio alignment algorithm uses an API called synctoolbox which is an API created by groups of researchers working on audio alignment and audio processing. Using the alignment algorithm returns matrix mapping points from a reference audio to a live audio segment. However, finding the starting point using the warping matrix is still non-trivial due to noise in the linearity. Linearity occurs when the live audio and the reference audio are in sync and, therefore, progress at the same rate. The warping matrix may find linearity occurring before the true starting point or the true starting point may have non-linearity shortly after that makes it hard to find. Overall, this means that a threshold to determine which linearity is the true starting point is needed while not perceiving the noise as the true starting position. For the warping matrix below, note how the linearity occurs at ~200 frames in the audio segment which is due to a pause at the beginning of the recording. Also note that it aligns with ~900 frames within the reference audio, which can be turned into an exact time within the recording.
This upcoming week, I will be looking at the robustness of both the warping alogirthm as well as the algorithm to find the true starting position. To do this, we have recorded several segments of audio where Sanjana performed with wrong notes, skipped bars, and arbitrary tempo changes. Several of these audio segments can be found below. After testing the robustness, I’ll be looking at how to package that starting time and turn it into coordinates on a page to place the cursor.
We are currently still on track and will continue to work hard to stay on track.
Reference Audio: Reference_Audio
Audio Segment (Recorded by Sanjana): Audio_Segment
Warping Matrix: Warping_Matrix