Team C6: Earworm – Page 2 – Carnegie Mellon ECE Capstone, Spring 2019

April 20, 2019May 4, 2019

Team Status Report 9

This week we each ran into some obstacles with our parts. Pitch detection and finding patterns cross similarity matrices were challenges that we foresaw but are still formidable, and none of us have much experience with web development.

We are a bit behind schedule but are working hard to overcome it!

April 20, 2019May 4, 2019

Anja Status Report #9

This week I processed the MIDI library by hand. The general procedure included opening the MIDI into Musescore and then converting the melodies into a dry piano tone. Then exporting the clip as a wave.

When I started processing the MIDIs, I noticed some flaws in my processing for recorded audio too, since comparing recorded audio to preprocessed audio was starkly different.

My algorithm for comparing the series remained them same, that is the DTW. However, I realized I had to come up with a better pitch detection algorithm. I went with one based on Power Spectral Density Functions with a Hamming window of about 80ms.

So far this is ok, I still have a lot of work left to do with cleaning up the signal.

This set me back a little because the pitch detection is rather complex.

I will continue cleaning the signal and trying to extract the melody as best as possible from the recorded objects in the following week.

April 14, 2019May 4, 2019

Anja Status Report #8

This week I was actually unable to have the meeting with Zheng. Instead, however, I have devised the following plan to create the library:

I will look up MIDIs for all of the songs we want in our library with this link:

http://www.midiworld.com/files/

From this website I can get MuseScore arrangements of all the pieces, from which I can specifically select the melodies of interest.

Musescore allows me to export mp3s of pure pitch tones of melodies of interest, which I will do for all the tracks/staffs within the songs we intend to put in our library.

Finally, I will process the .mp3s or .wavs in MATLAB the same way I process vocal input (record vocal input), so that the list of tones is the same data format as that used for recorded vocal queries.

From here, the matching should be same.

I think I am a little behind, I hope to be able to create a substantial library by the middle of next week.

April 14, 2019May 4, 2019

Wenting Status Report 8 + Team Status

Wenting:

This week I started playing with the histogram idea to show the distribution of distances to the actual melody. I am debating whether I want to show absolute distance or both positive/negative as that may determine whether someone was singing flat or sharp, or simply which way they were off by.

Again, I am behind schedule but will need to actually have the website built by the end of next week since the clock is ticking till final demo, and I need to work on integrating the data visualization into the website and with other parts.

Team:

As predicted, all of us were very busy with carnival commitments this past week and were not able to do that much. However, we will be majorly kicking into gear this week for each of our parts.

Although we are a little behind, we are confident about finishing what we need to get done.

April 6, 2019April 6, 2019

Wenting: Status Report 7 + Team Status

Wenting:

This week I demo’d and received helpful feedback for the data visualization aspects. I was plotting the original song against the sung input (although both are still contrived examples and coloring the space between them with some color in the gradient from red to green, where red meant it was not a close match and green meant it was a close match. However, I neglected to include a scale which made it hard to interpret the graph. Since it is a gradient, the colors are not necessarily very comparable so it will be hard to discern particularly as the length of the sample grows and there are more colors in the gradient.

Suggestions from course staff included plotting the distances between notes and making a histogram to show the distribution of distances. Also, coloring the actual line of the melody as red/yellow/green is more indicative, and I think with that I could add toggle options to display just the sung input, the sung input and the melody it was matched to, or just the matched melody. This is an option that I have considered and am still tinkering with.

While I did most of my work for the demo in Python, I may port it over to Matlab for easier interfacing with the other parts. Also, I would like to start working on the pipeline from the two branches of audio analysis to data visualization, since that will be a key part of my portion.

I think I am a little behind schedule since I wanted to spin up a basic website by this week but that has not been done, and I would like to be further along in prototyping data visualization techniques.

Team:

Post interim demo, we are all working on pushing forward with the next steps on each of our parts. The suggestions and comments made by course staff were helpful for us to work off of for the next few weeks, and we feel confident that we will have enough time to finish what we have to do.

The risks that we are facing have not changed much, though there may be a big push at the end to put all the parts together into one system, which is to be expected. As this coming week is carnival and we all have commitments to carnival events, we will all be busy but will try to still be on track.

Some parts have not moved as quickly as originally estimated, but overall our team is in a good place schedule-wise – we know what we need to get done and have a good sense of time about it.

April 6, 2019April 6, 2019

Anja Status Update #7

This week I performed a demo of the first version of the dtw matching algorithm. I then refined the algorithm even past the demo. I tried out a few more test cases that I hadn’t considered last week, including

prioritizing pattern over pitch (transpose invariant)

prioritizing majority pattern over accurate pattern.

My very next task is to do MIDI processing so that we can begin library synthesis. I will work with getting these in a MATLAB format, I have already contacted Professor Dannenberg’s Master’s student who does melody contouring into MIDIs by hand and with some automated tools. We have a meeting scheduled for Monday to explore the options and figure out how it works.

My goal is to finish this part and then immediately get things in the right format so my algorithm can start matching against the library. I’m sure that’s when interesting bugs will come up, and I’ll save further refinement for the week after!

I might be like half a week behind schedule, since finding a time to metope for MIDI processing took a bit longer than hoped. But hopefully I can make up for the future debugging with the exploration of other test cases I did in the waiting period.

April 5, 2019April 6, 2019

Nolan: Status Report 7

I’m still trying to work out the preprocessing for chroma feature cross-similarity analysis.

I actually received a very prompt reply from one of the members of the korean team whose work I’m building off of. He suggests adding pitch transformation to transpose the two samples into the same key before comparing them. This, in his words, is critical. I have a general idea of what should be done based on the paper they’ve cited that used this concept–the Optimal Transformation Index (basically, it takes the the chroma features, , averages it to find the most common key it seems to be in, and then shifts one of the two so that these match).

Obviously, the main potential flaw in this approach from the beginning is that we’ll be matching the average pitches of ONLY the melody with the average pitches of the ENTIRE song, creating a lot of noise. Even a perfectly correct singer may not be able to generate very promising matches. I’m looking into OTI features, and I’ve found a good way to benchmark this approach in general: I can compare a song to its isolated vocal track–if the original singer in the same studio recording can’t generate something that seems matchable vs. the original song, this method is probably not tenable.

The researcher I contacted actually suggested the opposite approach from the idea we originally got from Prof. Dannenberg–analyze the waveform to estimate melody from the audio directly, then compare melody -> melody.

From this, I’ve recieved some deep-learning based suggestions and a few that won’t require training, which might make some preliminary testing easier. Next week I’ll be looking at RPCA and YIN for vocal extraction and melody recognition.

March 31, 2019April 6, 2019

Anja + Team Status Report #6

This week I implemented a version of the algorithm with DTW. The code is able to take in vocal input and distinguish between vocal inputs that are more similar. To test it out, I recorded myself singing the same song twice and second totally different song and was able to have the ranked output that the first two were the most similar matching out of any other pairing in the available 3 songs. That is

1: song 1

2: song 1

3: song 2

1-2 was deemed more similar than 1-3 and 2-3.

This was great to see because it is a simple working version of the app. Imagining 2 and 3 to be the library and 1 to be the vocal input, we were able to see 1 matched to the right song in the library.

The next steps include:

-Getting the library to be processed MIDIs in the same format as the vocal input I am extracting now

-Seeing how well the ranking system works when the library is expanded and the comparisons grow upwards to in the hundreds. We might need to process the model at that point.

I believe that if next week I can begin the debugging step as described above, we should be in great shape.

TEAM UPDATE:

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?
- Biggest risk is that it is too slow and not very accurate — some naive plans include keeping the library fairly slow and not attempting to do a weighted average with the similarity-matrices.
Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?
- So far no changes were made to the design, the proof of concepts code we have so far seems to be working correctly. Next week as we begin to debug the situations that resemble the actual production environment, we might need to.
Provide an updated schedule if changes have occurred.
- No changes.

March 31, 2019April 6, 2019

Nolan: Status Report 6

This week I prepared for our demo. We will be able to show cross-similarity matrices between various songs and some samples, and have some evaluation present for tuning chroma features to provide clearer and more distinct patterns in the cross-similarity matrices. Hopefully, some heuristic evaluations of the similarity will show some trends that could allow us to pick a song out based on a sung sample. Failing this, we can look at a classifier that can determine the difference between a match and a mismatch from sung sample-> actual song waveform.

March 30, 2019

Wenting: Status Report 6

This week I delved more into the implementation of the data visualization techniques. I want for it to be visually obvious what parts are a close match and what parts are not, so there should be a red to green scale in some way or form, where red means that part is not a close match and green means it is a close match. This can be done through shading the differences between the graphs, or simply coloring the line representing the melody based on how close it is to the closest matched melody, though this would probably be harder to see.

A thought to consider is whether or not the input should be plotted against each song separately, or if the information can be aggregated into one visual. Using either of the methods mentioned above, they would have to be separate to be meaningful. I will be continuing to explore what methods are effective and most telling of the process.

I have not done much work on visualization for the CNN branch yet because it’s more difficult to do without data. To combat this I may try generating test matrices similar to the ones shown in this paper to explore some options. Perhaps using the similarity features we can pull out the bits of melody that were found to be matching.

Since our computation will largely be in Matlab, it may be easier to develop a web app instead of a mobile app. With that in mind, my goal for next week will be to spin up a basis/frame for the app.