Wenting: Status Report 2

This week our team met with Professor Roger Dannenberg to get his insight into our ideas and what would be more plausible for our project. He told us some more details about the query by humming project and how its poor performance resulted in it not being a product that could go on the market. The act of breaking down music into separate lines is actually quite difficult since it’s hard to determine instrument lines and such. He mentioned F0 estimation which tries to find the primary note at each instance, which is essentially melody tracking, but also has mixed results. His suggestions have shifted our path more towards looking into using similarity matrices to match the songs. Similarity matrices essentially plot one song against another, and searching for diagonals locates the places where the two songs match up.

The similarity matrix idea also helps to play into the data visualization aspect of our project. Matrices are inherently visual so if we end up using this technique then the work that has been done by the matching algorithm through the matrix can be shown.

A point brought up during our weekly meeting with Professor Savvides and the TAs was that including a data visualization portion would help with debugging as well. I read this article which addresses some of the existing software that people use for similar things, such as Tensorflow Graph Visualizer. We will probably end up using those softwares, or at the very least employing techniques from them.

Unfortunately I have technically fallen behind schedule. I did not anticipate how much more research and planning we would have to do before moving on to other tasks, such as designing the data format. However, I have furthered my understanding of the problem at hand sooner rather than later, so I have a better idea of what will work now rather than having to turn back to try a different solution too late.

For next week, I would like to have a more concrete plan of how we are going to go about implementing both the audio analysis and the matching algorithm and explore how that will interact with the data visualization aspect of the project.

Anja Status Report 2 + Team Status Report 2

Anja Kalaba Team C6:

  • This week my team and I spoke with Professor Dannenberg and were able to toss out the idea of dynamic library processing. It was settled that the final experiment that would be attempted before doing an optimal replication of the Query By Humming Paper would be trying to use similarity matrices comparing instrument samples to ensemble samples to verify the instrument’s membership in the ensemble. As a more reliable approach, MIDI file preprocessing was decided to be appropriate. I also worked on the design slides, it was settled I would be presenting our design on Monday.
  • I would say progress is on schedule.
  • Deliverables for this week hope to include a valid small test sample of the similarity matrix instrument membership verification, and then from the results of this a final decision about the route to go for algorithm and library/database composition.

 

Team C6 Status:

  • Most significant risks would be deciding to go with the similarity matrix approach and finding it isn’t applicable to voices. Our contingency plan is a full implementation of the Query By Humming paper, which seems credible and has fine results. If this also is found to be unstable, our plan is to provide a data visualization scheme to at least display the rigorous algorithmic work done under the hood.
  • No changes were made to design, still the consideration portion (seen above).
  • No Team schedule changes.

Nolan: Status Report 2

Nolan Hiehle

Capstone Status Report 2

 

This week we had a very helpful meeting with Professor Dannenberg. In it, we described our research up to this point and some different strategies we were looking at. Professor Dannenberg suggested we use Chroma Feature, an audio processing method for turning a spectrogram into a quantized vector of the 12 notes on a musical scale. While this obscures things like instruments, it’s a pretty solid way to turn raw audio (mp3 files or whatever) into some sort of informative musical thing to play around with.

It seems that most melody identification research is pretty lacking (at least in regards to melody extraction from a raw audio file as opposed to a MIDI), so we’re currently pursuing an algorithm that involves matching a sung query to an entire song with all instruments, as opposed to extracting a melody and matching against that.

Professor Dannenberg suggests creating a chroma feature of the songs to be matched against, then creating a chroma feature of the sung query sample, and creating a similarity matrix between them at every point in time. Some sort of pattern recognition could then potentially be applied to this matrix to look for a match somewhere between the sung query and the original song.

Some drawbacks to this include that this method is not key or tempo-invariant: for example, a singer singing all the correct intervals between notes, but starting on the wrong note (very easy to do for someone who does not have perfect pitch and does not know a song well) would not generate a match, since we’re matching pitches directly against each other. We do have the option of rotating the sung chroma vector 12 times and comparing against each, but it’s possible this could generate a lot of noise.

Similarly, this method is sensitive to tempo. Someone singing the right notes, but a little bit too fast or too slow could very easily not get a match. This is partially because of the way chroma feature works: we will get a 1d vector of pitches and choose some sampling rate (maybe a few times per second?)–but each sample will just be a snapshot of ALL notes present in the song at that moment, with no notion of a “line of melody” or a note changing. Because of this, a sped-up version of a melody could look pretty different than a slower version. Similarly, we could try to brute force fix this problem by taking our melody queries and ALSO running our search method on some number of sped-up chroma features and some number of slowed-down chroma features.

The sung sample will contain harmonics in its chroma vectors, so it may be a good idea to remove these–this is something we should determine with testing.

 

At this point, we need to build this system and test. We’ll start by just looking at similarity matrices manually to see if there are promising results before we start to build a pattern recognition system. Some researchers have achieved very good results using convolutional neural networks on similarity matrices for cover song identification, but this could be challenging to tune to an acceptable standard (and also might just be overkill for a demo-based capstone).

 

Team Status Report 1

  • What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

    The biggest risk of our project is the matching algorithm performing very poorly. As elaborated further in the next point, existing papers have shown that query by humming, a similar project, has poor performance. We will have a very limited library of songs that we will match to. Our contingency plan for this is to add a data visualization aspect to the project to show the work that the matching algorithm has done, even if it cannot actually match it to a song. For example, it can display highlighted portions of the melody that it found to be a match or close match.

  • Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

    We might not do dynamic processing on a flexible library anymore. ​Some papers indicate that preprocessing with the helpful use of MIDI file information (a broken down record of pitches and rhythms for each instrument in a piece) is widely used and much simpler. The cost here means that the focus of our project is to either analyze and process mp3 files or to do better data representation and search algorithms.

    The cost here is mitigated because our learning will be maximal — we can try out innovations to the existing algorithms since there isn’t much to lose.

    Realizing the potential lack of accuracy with the project, we decided to add the visualization component to the project, so that users can at least see the inner workings of our algorithms.

  •  Provide an updated schedule if changes have occurred.

    Our schedule is mostly the same, but we have updated our Gantt chart below with the changes highlighted. After further evaluation, machine learning may not be needed so we have generalized ML-related tasks to matching algorithm, and we have also added in the data visualization to the task list.

Wenting: Status Report 1

*Backlog from before websites were set up*

This week I did more thorough research into similar existing as well as related research that could be helpful for developing our solution. Roger Dannenberg, a CS professor whose primary field is computer music, has done a lot of work that is of interest for our project, including the query by humming project that we noted in our project proposal. We have been in contact with him and intend to meet with him soon.

The projects I looked into were the MUSART query by humming project and his work in structural analysis. From studying the query by humming project, I found that our intended method for analyzing songs was more difficult and had more improbable success than previously thought. Our most ambitious idea was to analyze songs by breaking it down into multiple voices by applying concepts from polyphonic pitch tracking, but this project simply used pre-existing MIDI files for that instead of analyzing the raw audio of the songs. Also, looking at the performance of their system, our goal for 75% accuracy in recognizing songs may not happen. In order to counteract the possibility of our system not being able to match the song, I came up with the idea to do some sort of data visualization. Even if we are unable to find a match, the algorithm will have done some manner of work to try and match it. I would like to include that in the results of the query to demonstrate that it did, in fact, try to do something. An example of what it might show is a highlighted portion of melody that it matched between the input query and an existing song.

While browsing Professor Dannenberg’s work, I stumbled upon his work in structural analysis. The purpose of these projects was to make models that could analyze songs and provide an explanation of it, i.e. “the song is in ‘AABA’ form.” Our project’s intent is not that, but I figured the analysis methods from these projects were relevant to what we are trying to do. Much of this work was in looking for patterns in the music to discover the structure of the song, including ways to transcribe melody using monophonic pitch tracking, chroma representation, and polyphonic transcription. We will likely be doing something similar in order to extract information from the input sounds.

I am currently on schedule, but the research phase will probably continue into next week as we explore all our options and discuss further with professors such as Roger Dannenberg.

I’d like to have some semblance of our data format design, and I intend to do research into data visualization, though that will depend on the data format and matching method.