Shivi’s Status Report for 3/8/25

Last week, I mainly focused on working on the design review document with Deeya and Grace. Incorporating the feedback we received during the design presentation, I worked mostly on the preprocessing/calibration, pitch detection, and design trade studies aspects of the design document. Additionally, Professor Dueck connected us with Professor Almarza from the School of Music, and Deeya and I met with him and the flutists from his studio. This helped us confirm our use case requirements, get their opinion on our current user workflow, and solicit their availability for testing out our pipeline in a few weeks. The flutists were excited about the project as a composition tool such as the one we are developing would greatly aid them in writing new compositions. Grace and I also discussed how to implement the audio segmentation; as of now, we are planning to apply RMS over 10 ms windows of the signal and use spikes in amplitude to determine where the new note begins. Based on our research, similar approaches have been used in open-source implementations for segmenting vocal audio by note, so we are optimistic about this approach for flute audio as well. We are currently on schedule with our progress, but I anticipate issues with audio segmentation this week, so we plan to hit the ground running for this aspect of our project on Monday so that we can have the segmentation working, at least for recordings of a few quarter notes, by the end of the week.

Shivi’s Status Report for 02/22/2025

This week, I spent most of my time working on the design review presentation and design review document. I also thought more about our current noise suppression method, for which we are using a Butterworth filter, spectral subtraction, and adaptive noise filtering. However, based on Professor Sullivan’s advice and my own experimentation with hyperparameters and various notes, the latter two methods do not make a significant improvement in the resulting signal. To avoid any redundancy and inefficiencies, I removed the spectral subtraction and adaptive noise filtering for now. Additionally, I looked more into how we can perform audio segmentation to make it easier to detect pitch and rhythm and found that we may be able to detect note onsets by examining , though this might not work for different volumes without some form of normalization. I will be working with Grace this week to combine our noise suppression and amplitude thresholding code, and more importantly, to work on implementing the note segmentation. Some of the risks with audio segmentation are as follows: noise (so we may need to go back and adjust noise suppression/filtering based on our segmentation results), detecting unintentional extra notes in the transition from one note to another (can be mitigated by setting a rule that consecutive notes must be, say, 100ms apart), and variations in volume (will be mitigated by Grace’s script for applying dynamic thresholding and normalizing the volume).  This week, we are also visiting with Professor Almarza from the School of Music to solicit flutists to test our transcription pipeline within the next few weeks.

We are currently on schedule, but we might need to build in extra time for the note segmentation, as detecting note onset and offset is one of the most challenging parts of the project. 

Shivi’s Status Report for 2/15/25

This week, I worked on setting up our codebase/environment, running experiments with various noise suppression/filtering methods implementation details for our final design.

With regards to noise suppression, Grace and I recorded some clear and noisy flute audio. I then experimented with some filtering/noise reduction techniques as well as the open-source Demucs deep learning model by Facebook, which separates music tracks. 

In my experiments, I used a combination of Butterworth bandpass filtering, adaptive noise reduction, and spectral gating. The Butterworth bandpass filter was applied first to ensure that only frequencies within the frequency range of the flute (261.6-2093.0 Hz) were captured in the audio. Then, I used spectral gating, which first estimates the noise profile from a specific part of the audio and subtracts it from the audio signal.

denoised = magnitude – reduction_factor * noise_stft

Currently, my script estimates the first second of audio to be noise, but this is not entirely accurate/true for all cases. This is why we introduced a calibration step into our pipeline, so that we can get a more accurate estimate of the noise in a particular environment as well as ensure that the user is playing loud enough (as the signal can always be reduced later). 

Then, the signal undergoes adaptive noise reduction to account for unpredictable fluctuations in background noise. I also experimented with various parameters such as prop_decrease (value between 0-1 that determines the degree of noise suppression), finding that 0.5 produced the best result. Below is a graph comparing the original and denoised signal:

Though this noise suppression module did eliminate much of the noise in the original signal, you could still hear some muffled speaking in the background, though this didn’t seem to interfere much with the detection of harmonics in the D note that was being played. My experimental code for the noise suppression and Fast Fourier Transform for harmonic detection is linked in the following repo.

The second approach I tried was using Demucs, a deep learning model open-sourced by Facebook to perform music track separation. However, since it is used mainly to separate vocals and percussion, it did a great job at filtering out everything except the metronome noise, as opposed to keeping only the flute noise. 

Given these results, I think the best route is to experiment more with a calibration step that allows the pipeline to take in both a noise signal and a flute+noise signal to be able to perform spectral gating more effectively. My current progress is on schedule. Next week, my plan is to run more experiments with the calibration and work with Grace to figure out the best way to segment the audio before performing the rhythm/pitch detection.