Anja + Team Status Report #6

This week I implemented a version of the algorithm with DTW. The code is able to take in vocal input and distinguish between vocal inputs that are more similar. To test it out, I recorded myself singing the same song twice and second totally different song and was able to have the ranked output that the first two were the most similar matching out of any other pairing in the available 3 songs. That is

1: song 1

2: song 1

3: song 2

1-2 was deemed more similar than 1-3 and 2-3.

This was great to see because it is a simple working version of the app. Imagining 2 and 3 to be the library and 1 to be the vocal input, we were able to see 1 matched to the right song in the library.

The next steps include:

-Getting the library to be processed MIDIs in the same format as the vocal input I am extracting now

-Seeing how well the ranking system works when the library is expanded and the comparisons grow upwards to in the hundreds. We might need to process the model at that point.

I believe that if next week I can begin the debugging step as described above, we should be in great shape.

 

TEAM UPDATE:

  • What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?
    • Biggest risk is that it is too slow and not very accurate — some naive plans include keeping the library fairly slow and not attempting to do a weighted average with the similarity-matrices.
  • Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?
    • So far no changes were made to the design, the proof of concepts code we have so far seems to be working correctly. Next week as we begin to debug the situations that resemble the actual production environment, we might need to.
  • Provide an updated schedule if changes have occurred.
    • No changes.

Nolan: Status Report 6

This week I prepared for our demo. We will be able to show cross-similarity matrices between various songs and some samples, and have some evaluation present for tuning chroma features to provide clearer and more distinct patterns in the cross-similarity matrices. Hopefully, some heuristic evaluations of the similarity will show some trends that could allow us to pick a song out based on a sung sample. Failing this, we can look at a classifier that can determine the difference between a match and a mismatch from sung sample-> actual song waveform.

 

 

Wenting: Status Report 6

This week I delved more into the implementation of the data visualization techniques. I want for it to be visually obvious what parts are a close match and what parts are not, so there should be a red to green scale in some way or form, where red means that part is not a close match and green means it is a close match. This can be done through shading the differences between the graphs, or simply coloring the line representing the melody based on how close it is to the closest matched melody, though this would probably be harder to see.

A thought to consider is whether or not the input should be plotted against each song separately, or if the information can be aggregated into one visual. Using either of the methods mentioned above, they would have to be separate to be meaningful. I will be continuing to explore what methods are effective and most telling of the process.

I have not done much work on visualization for the CNN branch yet because it’s more difficult to do without data. To combat this I may try generating test matrices similar to the ones shown in this paper to explore some options. Perhaps using the similarity features we can pull out the bits of melody that were found to be matching.

Since our computation will largely be in Matlab, it may be easier to develop a web app instead of a mobile app. With that in mind, my goal for next week will be to spin up a basis/frame for the app.

Nolan: Status Report 5

This week I begun using Chroma Toolbox in MATLAB. I’m hoping to get some practice using it and soon have a demo showing a similarity matrix between a sung sample and a reference song. Looking at a few of these and comparing them to similarity matrices of unrelated samples should let us know if the CNN algorithm will be viable.

Since most of our algorithms will probably be implemented in MATLAB, it might not be feasible to publish our project as a mobile app, but a desktop version might still be doable. Luckily, the problem has less to do with the computing power required and more of ease of development: there’s no reason that our algorithm COULDN’T be ported to a mobile device, and whatever we do produce could be redeveloped if it seems commercially viable.

Anja Status Report #5

Team C6

Anja Kalaba

This week I looked through a lot of available libraries for audio microphone processing in C, most notable is PortAudio. Additionally, MATLAB was found to have a stable processing scheme.

I was able to obtain input and represent user input in the desired format. Still have to look into if the method currently is compatible for the app our team desires to make. Next, I’ll need to make sure that MIDI files can be placed in this format too, and actually do the time warping this final created data format. Which is an array of pitches based on the sampled input that gets bucketed for each time stamp.

I would say I am on schedule.

Team Status Report

We are all in beginning implementation stages for our respective parts. Nolan has begun working with the chroma toolbox in Matlab, Wenting is beginning test data visualizations, and Anja is working on input discretization.

We originally wanted to do audio processing in C, but the libraries have not been conducive to productivity and work efficiency. Instead, we plan to use Matlab for the audio processing.

There are no new risks, but we are still concerned about the performance of our system in both accuracy and speed. We may have to lower the threshold of what is considered a “match,” though that may also result in false matches. Our original goal was to match in under 1 minute, and we hope to still meet that goal, though we may prioritize accuracy over speed if it comes down to it.

Wenting: Status Report 5

This week I developed a very basic dummy implementation of how the melodic contour analysis part of data visualization will be displayed. An example is shown in the figure below. The actual graphic would show the musical notes on the side (per the example we put in our design document, also attached below), but you can see in this graphic how the melody matches at some points and deviates at others.

I am also (in conjunction with Anja) looking at C libraries for processing audio so that I can try filtering background noise out of the sung input.

As mentioned in a previous status report, there are cool ways of visualizing at the convolution and filter layers in a CNN. Now that I have seen the background on it, I will be looking more into how to actually implement it.

Finally, I began UI design for the final app – sketches shown below.

Anja + Team Status Report

Nolan: Status Report 4

This week I spent the first half of the week working on the design document. Our team fleshed out a lot of technical and planning details we hadn’t considered yet, so it was useful to realize decisions we would need to make and have some conversations about them in advance.

 

I spent the second half of the week preparing for a midterm and working on booth, but I hope to have a deliverable for the chroma feature cross-similarity path by the end of spring break. I’ll be working on converting either a waveform in mp3 format or a sung sample to a chroma feature matrix and creating a cross-similarity matrix between them. This will get us to the next step, in which we’ll evaluate the matrices we see and start to think about if pattern matching on them to classify is feasible.

Wenting: Status Report 4

I spent the first part of this week working on the design document. This involved putting the technical details of the project in a more concrete, detailed, and professional form. It was a significant undertaking that took a lot of time and thought, and hopefully that will pay off come the final report!

Unfortunately I had a very busy week in other classes and was not able to dedicate as much time to capstone as I would have wanted to. I did spend more time looking into existing visualizations of CNNs and possible implementations for visualizing melodic contour analysis. Anja drew an example for the design review slides that is very viable – displaying the sung elody on top of the melody of the match (or top possible match(es)). The consideration is whether or not the visualization can be done real-time, and whether a timeline-like graphic will be possible. The idea for that is that it would a video/gif that would show the melody from start to end, with the beginning being matched to all songs and then with each note, songs that are not the match are shown to be eliminated (i.e. they disappear from the graphic) until only the final match(es) are shown.

In this article, there are examples of the intermediate convolutions, the ReLU activiation filters, and the convolution layer filters. They also showed examples of class activation heat maps, which I have some experience with from my previous research experience. I don’t think the heat map-type visual will be relevant to us, but the others are intriguing. Another visual I saw was this one that shows what I assume is a simplification and distilled version of the their CNN is doing. The input is shown, and some granularity of the layers in between, down to the final classification. These are inspiration for what the final visualization system will be.

My limited time kept me from beginning implementation but I plan to get started as soon as possible on building a test visualizer for melodic contour and tinkering more with CNN layer visualization. Additionally, I need to look into filtering background noise out of input audio.

*Note: Since it is spring break this is being written earlier than usual