akalaba – Team C6: Earworm

May 4, 2019May 8, 2019

Anja Status Report #11 Last one

The pitch contours are looking fairly good. I cleaned them further by trying to remove spurious troughs in the contour. I did this by taking max-series and then removing spurious peaks.

For improvement, I have been fine tuning the analysis. I have considered different options instead of just DTW:

For every alignment that looks really good, the autocorrelation would be completely symmetric, since the stretched function would look just like the one matched against it. So if I could instead get a measure of symmetry, or rather skewness, of the autocorrelation, this would show that for whatever amount of stretching needed (even if it were a lot), if it were very closely aligned then that would be the optimal match.

Besides the analysis, I have considered the complexity as well. Currently it matches within about 3 minutes for a 5 songs database, so this will need to be sped up completely. I am going to play around with the hop size for the DTW windowing. I have already bucketized the database contours and the query contours by about 0.1%, which seems to be the highest that still preserves information.

pictured above are some alignment plots. The blue is the sung query, and the contour of a song in the database at a segment in the song range that seemed to be the closest to what was sung.

Overall, I’m still going to be bumping up accuracy and complexity until the demo.

April 27, 2019May 4, 2019

Anja Status Report #10

This week I worked on improving pitch detection. I am going with Power Spectral Density function and peak detection within that. I have cleaned the signal up a little more, since I noticed many issues with spurious peaks and also the recording picking up harmonics.

I have passed the signal through a human vocal range band pass filter.

I have also thresholded based on amplitudes.

I had a meeting with Professor Stern’s team to talk about potentially improving it further. Their team has an approach that uses autocorrelation functions and then a histogram of peak differences to pick the best peaks. He suggested modifying which peaks I select.

So I still need to smooth out the signal, this job has proven more difficult than imagined. I will attempt to do more with better peak selection from the PSD plot.

All I have left to do is to improve the signal smoothing like such and to collect more statistics on the heuristics. We will make our final presentation very soon and also the poster for the poster session. Finally, I will place my scripts in my public AFS space so that our website can interface with it and query it.

I seem to be in a tight spot, but am hopeful things will be on time.

April 20, 2019May 4, 2019

Anja Status Report #9

This week I processed the MIDI library by hand. The general procedure included opening the MIDI into Musescore and then converting the melodies into a dry piano tone. Then exporting the clip as a wave.

When I started processing the MIDIs, I noticed some flaws in my processing for recorded audio too, since comparing recorded audio to preprocessed audio was starkly different.

My algorithm for comparing the series remained them same, that is the DTW. However, I realized I had to come up with a better pitch detection algorithm. I went with one based on Power Spectral Density Functions with a Hamming window of about 80ms.

So far this is ok, I still have a lot of work left to do with cleaning up the signal.

This set me back a little because the pitch detection is rather complex.

I will continue cleaning the signal and trying to extract the melody as best as possible from the recorded objects in the following week.

April 14, 2019May 4, 2019

Anja Status Report #8

This week I was actually unable to have the meeting with Zheng. Instead, however, I have devised the following plan to create the library:

I will look up MIDIs for all of the songs we want in our library with this link:

http://www.midiworld.com/files/

From this website I can get MuseScore arrangements of all the pieces, from which I can specifically select the melodies of interest.

Musescore allows me to export mp3s of pure pitch tones of melodies of interest, which I will do for all the tracks/staffs within the songs we intend to put in our library.

Finally, I will process the .mp3s or .wavs in MATLAB the same way I process vocal input (record vocal input), so that the list of tones is the same data format as that used for recorded vocal queries.

From here, the matching should be same.

I think I am a little behind, I hope to be able to create a substantial library by the middle of next week.

April 6, 2019April 6, 2019

Anja Status Update #7

This week I performed a demo of the first version of the dtw matching algorithm. I then refined the algorithm even past the demo. I tried out a few more test cases that I hadn’t considered last week, including

prioritizing pattern over pitch (transpose invariant)

prioritizing majority pattern over accurate pattern.

My very next task is to do MIDI processing so that we can begin library synthesis. I will work with getting these in a MATLAB format, I have already contacted Professor Dannenberg’s Master’s student who does melody contouring into MIDIs by hand and with some automated tools. We have a meeting scheduled for Monday to explore the options and figure out how it works.

My goal is to finish this part and then immediately get things in the right format so my algorithm can start matching against the library. I’m sure that’s when interesting bugs will come up, and I’ll save further refinement for the week after!

I might be like half a week behind schedule, since finding a time to metope for MIDI processing took a bit longer than hoped. But hopefully I can make up for the future debugging with the exploration of other test cases I did in the waiting period.

March 31, 2019April 6, 2019

Anja + Team Status Report #6

This week I implemented a version of the algorithm with DTW. The code is able to take in vocal input and distinguish between vocal inputs that are more similar. To test it out, I recorded myself singing the same song twice and second totally different song and was able to have the ranked output that the first two were the most similar matching out of any other pairing in the available 3 songs. That is

1: song 1

2: song 1

3: song 2

1-2 was deemed more similar than 1-3 and 2-3.

This was great to see because it is a simple working version of the app. Imagining 2 and 3 to be the library and 1 to be the vocal input, we were able to see 1 matched to the right song in the library.

The next steps include:

-Getting the library to be processed MIDIs in the same format as the vocal input I am extracting now

-Seeing how well the ranking system works when the library is expanded and the comparisons grow upwards to in the hundreds. We might need to process the model at that point.

I believe that if next week I can begin the debugging step as described above, we should be in great shape.

TEAM UPDATE:

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?
- Biggest risk is that it is too slow and not very accurate — some naive plans include keeping the library fairly slow and not attempting to do a weighted average with the similarity-matrices.
Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?
- So far no changes were made to the design, the proof of concepts code we have so far seems to be working correctly. Next week as we begin to debug the situations that resemble the actual production environment, we might need to.
Provide an updated schedule if changes have occurred.
- No changes.

March 24, 2019March 30, 2019

Anja Status Report #5

Team C6

Anja Kalaba

This week I looked through a lot of available libraries for audio microphone processing in C, most notable is PortAudio. Additionally, MATLAB was found to have a stable processing scheme.

I was able to obtain input and represent user input in the desired format. Still have to look into if the method currently is compatible for the app our team desires to make. Next, I’ll need to make sure that MIDI files can be placed in this format too, and actually do the time warping this final created data format. Which is an array of pitches based on the sampled input that gets bucketed for each time stamp.

I would say I am on schedule.

March 10, 2019May 4, 2019

Anja + Team Status Report

Team C6

Anja:

This week I worked on fleshing out details of the design document. I decided that the order in which I want to do my implementation is the following:

-Library

-queries

-query input discretization

-MIDI file analysis on library — talk to dannenberg’s graduate student

-DTW algorithm

– testing

So far, considerations for the library are the following:

Who Let the Dogs out, Baha Men –> acoustics

I Wanna Hold Your Hand, Beatles –> repetitive motifs

Ya Hey, Vampire Weekend –>cacophony like sounds

Aunt Leslie, Vulfpeck –> jazz genre, heavy bass

Fur Elise, Beethoven –> classical music, repetitive motif

These songs have diverse musical elements to be on the lookout for for tests, as described above.

I believe to be on track, for after break I would like to get the input discretization happening well.

Team:

The only significant risk is that of us falling behind. We have plans in place for data visualization to augment false feedback for the user. Our design was modified to include a secondary pipeline of different means to make song matching decisions. The 2 pipelines will run in parallel for added robustness, the first being melodic contour matching and the second being spectrogram convolutional neural nets.

One last risk we are unsure of is how quickly the algorithms will run, and thus the time complexity of the overall product. We aim for under a minute. If we notice in the next few weeks things take longer, one approach might be modifying the pipeline to be sequential instead of parallel, if it is noticed that one stream is significantly faster than the other one and can filter our results for the slower one to run on.

No updated schedule.

March 3, 2019March 5, 2019

Anja Status Report 3

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

I finished the finalized our design concepts, created the design slides and have been working on the design review. I had to review presenting and practice the slides. I was also in communication with Professor Dannenberg and have settled we cannot contact the team members of Theme Extractor but will be in contact with his grad students to borrow melody extraction from MIDI files. We also finalized a division of labor, so next week I can start on the input discretization and type warping algorithm.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Definitely on schedule as long as implementation starts next week!

What deliverables do you hope to complete in the next week?

A database of theme extracted songs, probably about 3 songs to start. Some humming samples from all 3 team members. The ability to discretize vocal input. A sketch in C for how to begin the time warping algorithm on the data.

February 24, 2019March 5, 2019

Anja Status Report 2 + Team Status Report 2

Anja Kalaba Team C6:

This week my team and I spoke with Professor Dannenberg and were able to toss out the idea of dynamic library processing. It was settled that the final experiment that would be attempted before doing an optimal replication of the Query By Humming Paper would be trying to use similarity matrices comparing instrument samples to ensemble samples to verify the instrument’s membership in the ensemble. As a more reliable approach, MIDI file preprocessing was decided to be appropriate. I also worked on the design slides, it was settled I would be presenting our design on Monday.
I would say progress is on schedule.
Deliverables for this week hope to include a valid small test sample of the similarity matrix instrument membership verification, and then from the results of this a final decision about the route to go for algorithm and library/database composition.

Team C6 Status:

Most significant risks would be deciding to go with the similarity matrix approach and finding it isn’t applicable to voices. Our contingency plan is a full implementation of the Query By Humming paper, which seems credible and has fine results. If this also is found to be unstable, our plan is to provide a data visualization scheme to at least display the rigorous algorithmic work done under the hood.
No changes were made to design, still the consideration portion (seen above).
No Team schedule changes.