Nolan: Status Report 5

This week I begun using Chroma Toolbox in MATLAB. I’m hoping to get some practice using it and soon have a demo showing a similarity matrix between a sung sample and a reference song. Looking at a few of these and comparing them to similarity matrices of unrelated samples should let us know if the CNN algorithm will be viable.

Since most of our algorithms will probably be implemented in MATLAB, it might not be feasible to publish our project as a mobile app, but a desktop version might still be doable. Luckily, the problem has less to do with the computing power required and more of ease of development: there’s no reason that our algorithm COULDN’T be ported to a mobile device, and whatever we do produce could be redeveloped if it seems commercially viable.

Anja Status Report #5

Team C6

Anja Kalaba

This week I looked through a lot of available libraries for audio microphone processing in C, most notable is PortAudio. Additionally, MATLAB was found to have a stable processing scheme.

I was able to obtain input and represent user input in the desired format. Still have to look into if the method currently is compatible for the app our team desires to make. Next, I’ll need to make sure that MIDI files can be placed in this format too, and actually do the time warping this final created data format. Which is an array of pitches based on the sampled input that gets bucketed for each time stamp.

I would say I am on schedule.

Team Status Report

We are all in beginning implementation stages for our respective parts. Nolan has begun working with the chroma toolbox in Matlab, Wenting is beginning test data visualizations, and Anja is working on input discretization.

We originally wanted to do audio processing in C, but the libraries have not been conducive to productivity and work efficiency. Instead, we plan to use Matlab for the audio processing.

There are no new risks, but we are still concerned about the performance of our system in both accuracy and speed. We may have to lower the threshold of what is considered a “match,” though that may also result in false matches. Our original goal was to match in under 1 minute, and we hope to still meet that goal, though we may prioritize accuracy over speed if it comes down to it.

Wenting: Status Report 5

This week I developed a very basic dummy implementation of how the melodic contour analysis part of data visualization will be displayed. An example is shown in the figure below. The actual graphic would show the musical notes on the side (per the example we put in our design document, also attached below), but you can see in this graphic how the melody matches at some points and deviates at others.

I am also (in conjunction with Anja) looking at C libraries for processing audio so that I can try filtering background noise out of the sung input.

As mentioned in a previous status report, there are cool ways of visualizing at the convolution and filter layers in a CNN. Now that I have seen the background on it, I will be looking more into how to actually implement it.

Finally, I began UI design for the final app – sketches shown below.

Anja + Team Status Report

Nolan: Status Report 4

This week I spent the first half of the week working on the design document. Our team fleshed out a lot of technical and planning details we hadn’t considered yet, so it was useful to realize decisions we would need to make and have some conversations about them in advance.

 

I spent the second half of the week preparing for a midterm and working on booth, but I hope to have a deliverable for the chroma feature cross-similarity path by the end of spring break. I’ll be working on converting either a waveform in mp3 format or a sung sample to a chroma feature matrix and creating a cross-similarity matrix between them. This will get us to the next step, in which we’ll evaluate the matrices we see and start to think about if pattern matching on them to classify is feasible.

Wenting: Status Report 4

I spent the first part of this week working on the design document. This involved putting the technical details of the project in a more concrete, detailed, and professional form. It was a significant undertaking that took a lot of time and thought, and hopefully that will pay off come the final report!

Unfortunately I had a very busy week in other classes and was not able to dedicate as much time to capstone as I would have wanted to. I did spend more time looking into existing visualizations of CNNs and possible implementations for visualizing melodic contour analysis. Anja drew an example for the design review slides that is very viable – displaying the sung elody on top of the melody of the match (or top possible match(es)). The consideration is whether or not the visualization can be done real-time, and whether a timeline-like graphic will be possible. The idea for that is that it would a video/gif that would show the melody from start to end, with the beginning being matched to all songs and then with each note, songs that are not the match are shown to be eliminated (i.e. they disappear from the graphic) until only the final match(es) are shown.

In this article, there are examples of the intermediate convolutions, the ReLU activiation filters, and the convolution layer filters. They also showed examples of class activation heat maps, which I have some experience with from my previous research experience. I don’t think the heat map-type visual will be relevant to us, but the others are intriguing. Another visual I saw was this one that shows what I assume is a simplification and distilled version of the their CNN is doing. The input is shown, and some granularity of the layers in between, down to the final classification. These are inspiration for what the final visualization system will be.

My limited time kept me from beginning implementation but I plan to get started as soon as possible on building a test visualizer for melodic contour and tinkering more with CNN layer visualization. Additionally, I need to look into filtering background noise out of input audio.

*Note: Since it is spring break this is being written earlier than usual

 

Anja Status Report 3

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

I finished the finalized our design concepts, created the design slides and have been working on the design review. I had to review presenting and practice the slides. I was also in communication with Professor Dannenberg and have settled we cannot contact the team members of Theme Extractor but will be in contact with his grad students to borrow melody extraction from MIDI files. We also finalized a division of labor, so next week I can start on the input discretization and type warping algorithm.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Definitely on schedule as long as implementation starts next week!

  • What deliverables do you hope to complete in the next week?

A database of theme extracted songs, probably about 3 songs to start. Some humming samples from all 3 team members. The ability to discretize vocal input. A sketch in C for how to begin the time warping algorithm on the data.

Nolan: Status Report 3

This week I began working on my arm of the project, the chroma feature similarity matrix analysis. Since the first step is building chroma features (also known as chromagrams), I’ve started looking into available toolboxes/code for creating these. Most of the existing work seems to be in MATLAB, so if I want to use an existing chromogram library I’ll have to decide between working in matlab and compiling to c++ or simply drawing inspiration from the libraries and building my own implementation.  Even within chroma feature extraction, there are lots of design parameters to consider. There will be a choice between how the chroma vector is constructed (a series of filters with different cutoffs, or fourier analysis and binning are both viable options). On top of this, Pre and post-processing can dramatically alter the features of a chroma vector. The feature rate is also a relevant consideration: how many times per second do we want to record a chromagram?

Some relevant pre-and post-processing tricks to consider:

accounting for different tunings. The toolbox tries several offsets of <1 semitones and picks whichever one is ‘most suitable’. If we simply use the same bins for all recordings we may not need to worry about this? but also, a variation of this could be used to provide some key-invariance.

Normalization to remove dynamics–dynamics might actually be useful in identifying a song. We should probably test with and without this processing variant.

“flattening” the vectors using logarithmic features–this accounts for the fact that sound intensity is experienced logarithmically, and changes the relative intensity of notes in a given sample.

logarithmic compression and a discrete cosine transform to discard timbre information and attempt to get only the pitch info

Windowing different samples together and downsampling to smooth out the chroma feature in the time dimension–this could help obscure some local tempo variations, but its unclear right now if that’s something we want for this project. This does offer a way to change the tempo of a chroma feature, so we may want to use this if we try to build in tempo-invariance.

As it turns out, these researchers have done some work in audio matching (essentially what we’re doing) using chroma feature, and suggest some settings for their chroma toolbox that should lead to better performance, so that’s a great place for us to start.

an important paper from this week:

https://www.audiolabs-erlangen.de/content/05-fau/professor/00-mueller/03-publications/2011_MuellerEwert_ChromaToolbox_ISMIR.pdf

http://resources.mpi-inf.mpg.de/MIR/chromatoolbox/

http://resources.mpi-inf.mpg.de/MIR/chromatoolbox/2005_MuellerKurthClausen_AudioMatching_ISMIR.pdf