I spent the first part of this week working on the design document. This involved putting the technical details of the project in a more concrete, detailed, and professional form. It was a significant undertaking that took a lot of time and thought, and hopefully that will pay off come the final report!
Unfortunately I had a very busy week in other classes and was not able to dedicate as much time to capstone as I would have wanted to. I did spend more time looking into existing visualizations of CNNs and possible implementations for visualizing melodic contour analysis. Anja drew an example for the design review slides that is very viable – displaying the sung elody on top of the melody of the match (or top possible match(es)). The consideration is whether or not the visualization can be done real-time, and whether a timeline-like graphic will be possible. The idea for that is that it would a video/gif that would show the melody from start to end, with the beginning being matched to all songs and then with each note, songs that are not the match are shown to be eliminated (i.e. they disappear from the graphic) until only the final match(es) are shown.
In this article, there are examples of the intermediate convolutions, the ReLU activiation filters, and the convolution layer filters. They also showed examples of class activation heat maps, which I have some experience with from my previous research experience. I don’t think the heat map-type visual will be relevant to us, but the others are intriguing. Another visual I saw was this one that shows what I assume is a simplification and distilled version of the their CNN is doing. The input is shown, and some granularity of the layers in between, down to the final classification. These are inspiration for what the final visualization system will be.
My limited time kept me from beginning implementation but I plan to get started as soon as possible on building a test visualizer for melodic contour and tinkering more with CNN layer visualization. Additionally, I need to look into filtering background noise out of input audio.
*Note: Since it is spring break this is being written earlier than usual