April 2019 – Page 2 – Team C6: Earworm

This week I performed a demo of the first version of the dtw matching algorithm. I then refined the algorithm even past the demo. I tried out a few more test cases that I hadn’t considered last week, including

prioritizing pattern over pitch (transpose invariant)

prioritizing majority pattern over accurate pattern.

My very next task is to do MIDI processing so that we can begin library synthesis. I will work with getting these in a MATLAB format, I have already contacted Professor Dannenberg’s Master’s student who does melody contouring into MIDIs by hand and with some automated tools. We have a meeting scheduled for Monday to explore the options and figure out how it works.

My goal is to finish this part and then immediately get things in the right format so my algorithm can start matching against the library. I’m sure that’s when interesting bugs will come up, and I’ll save further refinement for the week after!

I might be like half a week behind schedule, since finding a time to metope for MIDI processing took a bit longer than hoped. But hopefully I can make up for the future debugging with the exploration of other test cases I did in the waiting period.

I’m still trying to work out the preprocessing for chroma feature cross-similarity analysis.

I actually received a very prompt reply from one of the members of the korean team whose work I’m building off of. He suggests adding pitch transformation to transpose the two samples into the same key before comparing them. This, in his words, is critical. I have a general idea of what should be done based on the paper they’ve cited that used this concept–the Optimal Transformation Index (basically, it takes the the chroma features, , averages it to find the most common key it seems to be in, and then shifts one of the two so that these match).

Obviously, the main potential flaw in this approach from the beginning is that we’ll be matching the average pitches of ONLY the melody with the average pitches of the ENTIRE song, creating a lot of noise. Even a perfectly correct singer may not be able to generate very promising matches. I’m looking into OTI features, and I’ve found a good way to benchmark this approach in general: I can compare a song to its isolated vocal track–if the original singer in the same studio recording can’t generate something that seems matchable vs. the original song, this method is probably not tenable.

The researcher I contacted actually suggested the opposite approach from the idea we originally got from Prof. Dannenberg–analyze the waveform to estimate melody from the audio directly, then compare melody -> melody.

From this, I’ve recieved some deep-learning based suggestions and a few that won’t require training, which might make some preliminary testing easier. Next week I’ll be looking at RPCA and YIN for vocal extraction and melody recognition.

Month: April 2019

Anja Status Update #7

Nolan: Status Report 7