This week, I worked on setting up our codebase/environment, running experiments with various noise suppression/filtering methods implementation details for our final design.
With regards to noise suppression, Grace and I recorded some clear and noisy flute audio. I then experimented with some filtering/noise reduction techniques as well as the open-source Demucs deep learning model by Facebook, which separates music tracks.
In my experiments, I used a combination of Butterworth bandpass filtering, adaptive noise reduction, and spectral gating. The Butterworth bandpass filter was applied first to ensure that only frequencies within the frequency range of the flute (261.6-2093.0 Hz) were captured in the audio. Then, I used spectral gating, which first estimates the noise profile from a specific part of the audio and subtracts it from the audio signal.
denoised = magnitude – reduction_factor * noise_stft
Currently, my script estimates the first second of audio to be noise, but this is not entirely accurate/true for all cases. This is why we introduced a calibration step into our pipeline, so that we can get a more accurate estimate of the noise in a particular environment as well as ensure that the user is playing loud enough (as the signal can always be reduced later).
Then, the signal undergoes adaptive noise reduction to account for unpredictable fluctuations in background noise. I also experimented with various parameters such as prop_decrease (value between 0-1 that determines the degree of noise suppression), finding that 0.5 produced the best result. Below is a graph comparing the original and denoised signal:

Though this noise suppression module did eliminate much of the noise in the original signal, you could still hear some muffled speaking in the background, though this didn’t seem to interfere much with the detection of harmonics in the D note that was being played. My experimental code for the noise suppression and Fast Fourier Transform for harmonic detection is linked in the following repo.
The second approach I tried was using Demucs, a deep learning model open-sourced by Facebook to perform music track separation. However, since it is used mainly to separate vocals and percussion, it did a great job at filtering out everything except the metronome noise, as opposed to keeping only the flute noise.
Given these results, I think the best route is to experiment more with a calibration step that allows the pipeline to take in both a noise signal and a flute+noise signal to be able to perform spectral gating more effectively. My current progress is on schedule. Next week, my plan is to run more experiments with the calibration and work with Grace to figure out the best way to segment the audio before performing the rhythm/pitch detection.