Team Status Report for 4/12/25

Last week, we had a successful interim demo where we had our transcription pipeline working for a recording of Twinkle Twinkle Little Star. We also met with a flutist from the School of Music to get her feedback on our pipeline and obtain some sample recordings. She found the interface intuitive and easy-to-use, though we did run into some bugs with audio file formats and found that our note segmentation struggled a little with slurred notes. 

This week, we focused on the following items:

  1. Fix the denoising step
  2. Set up websockets for real-time BPM adjustment
  3. Any remaining frontend-backend integration remaining for the web app. (for e.g., earlier we had some bugs with audio file formats and with recording audio directly via the web app)
  4. Using a Short Time Energy approach instead of RMS to perform note segmentation. This helped to better account for rests/slurs in the music.

Later this weekend, we are meeting again with another flutist from the SoM to obtain more audio and to see if our note segmentation performs better this time. Our new audio interface and XLR cable also arrived this week, so we will hopefully be able to collect better audio samples as well. In the upcoming week, we will focus on:

  1. Polishing our STE/note segmentation
  2. Fixing the issues with making our sheet music editable via the Flat API
  3. Collecting user metrics such as their transcription history
  4. Deploying our web app
  5. Preparing our final presentation/demo
  6. Thorough testing of our pipeline

Below is our plan for verification, which we already started last week.

Shivi’s Status Report for 4/12/25

 

This week, I first worked on fixing the denoising step so that the note octaves would be accurate. Earlier, the notes would sometimes come out an octave higher because the bandpass filter was cutting out some of the lower frequencies, so I adjusted the frequency range to prevent this from happening. I also set up the websocket for real-time adjustment of the metronome, so the user is now able to adjust the tempo of the composition. Deeya and I integrated all of the webapp code and have been trying to figure out how to make the generated composition editable via the Flat API; unfortunately, we have been running into a lot of issues with it but are going to continue debugging this this week. I am also adding inputs for the user to be able to specify a key signature and time signature. Overall, my progress is on track. Pitch detection and MIDI encoding is largely done, and in the upcoming week, I will be focusing on resolving the issues with editing the sheet music directly through our web app using the Flat API and adding the key/time signatures. 

Shivi’s Status Report for 3/29/25

This week, I worked on preparing for the interim demo. I refined my pitch detection to account for rests and ensure that the generated notes were accurate (i.e. earlier, some notes were incorrectly being marked as flat/sharp instead of natural). Then, I worked with Deeya to set up the Flat.io API, as we were running into several errors with authorizing and formatting sending/receiving requests and responses. However, we were able to figure out how to send our generated MIDI files to the API for processing into sheet music. Finally, Grace and I worked on ensuring compatibility between our code, and I finished modularizing all our existing code and integrating it into a single pipeline that gets triggered from the web app and runs in the backend. Pitch detection is mostly done, and for next steps, I will be working on:

  1. Tempo detection
  2. Setting up websockets for our webapp for real-time adjustment of the metronome + assisting Deeya with making the displayed sheet music editable
  3. Working with Grace to refine audio segmentation (ex: rests and incorporating Short-Time Energy for more accurate note duration detection)

I am also finding that when I incorporate the denoising step into the pipeline, the detected pitches are thrown off a bit, so I’ll have to look more into ensuring that the denoising step does not impact the pitch detection.

Shivi’s Status Report for 3/22/25

This week, I focused on being able to write the detected rhythm/pitch information to a MIDI file and also looked into APIs for displaying the generated MIDI information as sheet music. Using the pitch detection I did last week, I wrote another script that takes in the MIDI note numbers and note types and creates MIDI messages. Each note is associated with a message that encodes its frequency, duration, and loudness, and the script generates a .mid file with all the notes and their corresponding attributes. I tested this on a small clip of Twinkle, Twinkle Little Star; for the generated .mid file, I then uploaded this to the music notation platform, flat.io to see if the .mid file contained the correct notation. Below is the generated sheet music. For now, all the note pitches were generated by my pitch detection script, but all the notes are hard coded as quarter notes for now as our rhythm detection is in progress. The note segmentation –> pitch detection –> MIDI generation pipeline seems to be generating mostly correct notes for basic rhymes like Twinkle Twinkle.

Earlier this week, I also did some research into APIs that we could use to display the generated sheet music on our web application in a way that is similar to MuseScore, a popular music notation application. While MuseScore doesn’t have an API that we can use, flat.io has a developer guide that will allow us to display the generated sheet music. Next week, I will be looking more into the developer guide and working with Deeya to set up/integrate the Flat API onto our web app. I will also work with Grace to refine/test our note segmentation more and ensure it is accurate for other notes and rests. We will also potentially be meeting one of the flutists this week so that we can collect more audio samples as well. Overall, my progress is on schedule, and hopefully we will have our transcription pipeline working on simple audio samples for our interim demo.

Team Status Report for 3/15/25

This week, we made significant progress on the web app setup, audio segmentation, and pitch detection components of our project. We also received our microphone, and Professor Sullivan lended us an audio interface that we can use to record some audio.

Below is an image of what our web app currently looks like. Here, a user can upload flute audio and a recording of their background. They can also adjust the tempo of the metronome (at least for MVP, we are not performing tempo detection, and the user needs to set their tempo/metronome).

Additionally, we now have a basic implementation of audio segmentation (using RMS) working. Below is a graph showing a flute signal of Twinkle Twinkle Little Star, where the red lines mark the start of a new note as detected by our algorithm, and the blue dotted lines represent the actual note onset. Our algorithm’s detected notes were within 0.1ms of the actual note onset.

We achieved similar results with Ten Little Monkeys at regular and 2x speed, though we still need to add a way to dynamically adjust the RMS threshold based on the signal’s max amplitude, rather than using trial and error.

We also started performing pitch detection. To do so, we are using comb filtering and Fourier transforms to analyze the frequencies present in the played note. We then use the fundamental frequency to determine the corresponding note. We were able to successfully determine the MIDI notes for Twinkle Twinkle and plan to continue testing this out on more audio samples. 

We are on schedule with our progress currently. For the upcoming week, we plan to integrate all of our existing code together and test/refine the audio segmentation and pitch detection to ensure that it is more robust to various tempos, rhythms, and frequencies. We are also soliciting the SOM flutists’ availability so that we can start some initial testing the week of March 24th. Additionally, after speaking with Professor Chang last week during lab, we have decided to build in some time to add a feature in which users can edit the generated music score (i.e., move measures around, adjust notes, add notation such as trills/crescendos/etc. and more). 

Shivi’s Status Report for 3/15/25

This week, I met with Grace to test the audio segmentation algorithm she wrote. We tested it on a sample of Twinkle Twinkle Little Star, as well as Ten Little Monkeys. We found that for each of the two samples, we needed to adjust the RMS threshold to account for differences in the maximum amplitude of the signal; as a result, we realized that we will need to add some way to either standardize the amplitude of our signal or dynamically change the RMS threshold based on the signal’s amplitude. 

I also worked to integrate our preprocessing and audio segmentation code all together. Our current pipeline can be found on this GitHub (Segmentation/seg.py for note segmentation, and Pitch Detection/pitch.py for pitch detection) along with some of our past experimentation code.

Furthermore, now that we have audio segmentation, I was able to get pitch detection to work, at least on Twinkle Twinkle.  To do so, I used FFT and comb filtering to find and map the fundamental frequency to the MIDI note. I plan to test the pitch detection on more audio samples next week and work with Grace and Deeya to integrate all the stages of our project that we have implemented so far (web app and triggering the preprocessing/audio segmentation/pitch detection pipeline). 

Shivi’s Status Report for 3/8/25

Last week, I mainly focused on working on the design review document with Deeya and Grace. Incorporating the feedback we received during the design presentation, I worked mostly on the preprocessing/calibration, pitch detection, and design trade studies aspects of the design document. Additionally, Professor Dueck connected us with Professor Almarza from the School of Music, and Deeya and I met with him and the flutists from his studio. This helped us confirm our use case requirements, get their opinion on our current user workflow, and solicit their availability for testing out our pipeline in a few weeks. The flutists were excited about the project as a composition tool such as the one we are developing would greatly aid them in writing new compositions. Grace and I also discussed how to implement the audio segmentation; as of now, we are planning to apply RMS over 10 ms windows of the signal and use spikes in amplitude to determine where the new note begins. Based on our research, similar approaches have been used in open-source implementations for segmenting vocal audio by note, so we are optimistic about this approach for flute audio as well. We are currently on schedule with our progress, but I anticipate issues with audio segmentation this week, so we plan to hit the ground running for this aspect of our project on Monday so that we can have the segmentation working, at least for recordings of a few quarter notes, by the end of the week.

Shivi’s Status Report for 02/22/2025

This week, I spent most of my time working on the design review presentation and design review document. I also thought more about our current noise suppression method, for which we are using a Butterworth filter, spectral subtraction, and adaptive noise filtering. However, based on Professor Sullivan’s advice and my own experimentation with hyperparameters and various notes, the latter two methods do not make a significant improvement in the resulting signal. To avoid any redundancy and inefficiencies, I removed the spectral subtraction and adaptive noise filtering for now. Additionally, I looked more into how we can perform audio segmentation to make it easier to detect pitch and rhythm and found that we may be able to detect note onsets by examining , though this might not work for different volumes without some form of normalization. I will be working with Grace this week to combine our noise suppression and amplitude thresholding code, and more importantly, to work on implementing the note segmentation. Some of the risks with audio segmentation are as follows: noise (so we may need to go back and adjust noise suppression/filtering based on our segmentation results), detecting unintentional extra notes in the transition from one note to another (can be mitigated by setting a rule that consecutive notes must be, say, 100ms apart), and variations in volume (will be mitigated by Grace’s script for applying dynamic thresholding and normalizing the volume).  This week, we are also visiting with Professor Almarza from the School of Music to solicit flutists to test our transcription pipeline within the next few weeks.

We are currently on schedule, but we might need to build in extra time for the note segmentation, as detecting note onset and offset is one of the most challenging parts of the project. 

Shivi’s Status Report for 2/15/25

This week, I worked on setting up our codebase/environment, running experiments with various noise suppression/filtering methods implementation details for our final design.

With regards to noise suppression, Grace and I recorded some clear and noisy flute audio. I then experimented with some filtering/noise reduction techniques as well as the open-source Demucs deep learning model by Facebook, which separates music tracks. 

In my experiments, I used a combination of Butterworth bandpass filtering, adaptive noise reduction, and spectral gating. The Butterworth bandpass filter was applied first to ensure that only frequencies within the frequency range of the flute (261.6-2093.0 Hz) were captured in the audio. Then, I used spectral gating, which first estimates the noise profile from a specific part of the audio and subtracts it from the audio signal.

denoised = magnitude – reduction_factor * noise_stft

Currently, my script estimates the first second of audio to be noise, but this is not entirely accurate/true for all cases. This is why we introduced a calibration step into our pipeline, so that we can get a more accurate estimate of the noise in a particular environment as well as ensure that the user is playing loud enough (as the signal can always be reduced later). 

Then, the signal undergoes adaptive noise reduction to account for unpredictable fluctuations in background noise. I also experimented with various parameters such as prop_decrease (value between 0-1 that determines the degree of noise suppression), finding that 0.5 produced the best result. Below is a graph comparing the original and denoised signal:

Though this noise suppression module did eliminate much of the noise in the original signal, you could still hear some muffled speaking in the background, though this didn’t seem to interfere much with the detection of harmonics in the D note that was being played. My experimental code for the noise suppression and Fast Fourier Transform for harmonic detection is linked in the following repo.

The second approach I tried was using Demucs, a deep learning model open-sourced by Facebook to perform music track separation. However, since it is used mainly to separate vocals and percussion, it did a great job at filtering out everything except the metronome noise, as opposed to keeping only the flute noise. 

Given these results, I think the best route is to experiment more with a calibration step that allows the pipeline to take in both a noise signal and a flute+noise signal to be able to perform spectral gating more effectively. My current progress is on schedule. Next week, my plan is to run more experiments with the calibration and work with Grace to figure out the best way to segment the audio before performing the rhythm/pitch detection.

Team Status Report for 2/15/25

This week, we focused on working on our high-level design for our design presentation. After discussing with Ankit and Professor Sullivan, we placed an order for some hardware to begin working with: Behringer CB 100 Gooseneck Condenser Instrument Microphone, and an XLR to USB-C Adapter. This will allow us to improve our current experiments, as we will be able to obtain clearer audio recordings. Based on our discussions this past week, we also decided to move our entire implementation into software. Additionally, we determined that it would be best for us to provide users with a metronome (which will be a sound outside the frequency range of the flute so that it can be filtered out later) set to a default 60 BPM, which the user will be able to adjust in real-time using a slider on our web app. Previously, we had recorded single notes from the B flat major scale to experiment with harmonics, but this week we met up to also record some noisy signals to experiment with noise reduction and work on encoding information for single notes into a MIDI file and uploading it to Musescore to see if we could translate it into sheet music (see individual team member reports). After a lot of discussion, we also concluded that a real-time transcription is not relevant for our use case, since a user would only need to see the transcribed output once they are done playing. 

Our new pipeline will work as follows:

  1. User logs into their account.
  2. User calibration: Ensure the user is playing at some minimum threshold before they upload a recording.
  3. User is prompted to record and upload an audio file of their background noise.
  4. User is prompted to record flute audio. (Consideration: have a play/pause recording in case the user needs to pause in between the recording?) To do so, they turn on the metronome on the web app. Metronome is set to 60 BPM by default, but they can adjust it in real-time using a slider.
  5. The audio is saved in the website’s database, and the pitch/rhythm detection pipeline is triggered in the backend. 
    1. Noise suppression via Butterworth filter and adaptive noise filtering. 
    2. Audio segmentation: Spectral Flux (how much the spectrum changes over time) and Short-Time Energy (STE) (detect sudden amplitude increases) to determine onset of a note
    3. For each segment (we can parallelize this with threading so multiple segments can be processed at once):
      1. Use note length to determine its type (eighth, quarter, half, whole, etc)
      2. Use FFT to determine frequency/pitch and classify which note it is
  6. Encode the info from all the segments into a MIDI file
  7. MIDI file gets uploaded to the web database and MuseScore API converts MIDI into sheet music
  8. Newly generated file is stored along with that user’s previous transcriptions that they can view
  9. IF time remains: we can add an editing feature where the user can adjust transcribed notes and add additional notation like crescendos, etc.

The biggest risk/challenge as of now is verifying if the methods that we are planning to use for noise suppression, pitch/rhythm detection will work or not. For instance, based on this week’s experiments with noise suppression, we experimented with a variety of filters (see Shivi’s status report), but found that many times, the flute audio would also get suppressed or that the background noise would not be suppressed enough. We would like to run more experiments, and our contingency for this is a calibration step that gives us a noise sample that we can then subtract from the flute audio signal. Similarly, note onset detection will probably be quite challenging as well, because it may be difficult to determine the exact moment a note ends. This is why we are deciding to segment the audio as our initial processing step, and then “round” the duration of each segment to the nearest eighth of a beat based on the BPM. 

Despite these challenges, we are on-track with our schedule; over the next week, we plan to have an even more detailed design while simultaneously working on setting up our web app, experimenting more with signal calibration/noise suppression, and starting on audio segmentation.

Week-specific status report questions:

Part A (Shivi): Our flute transcription system enhances public health by supporting creative expression through music learning. With an attachable microphone and a software-only pipeline, it is affordable and safe to use. Our system also promotes welfare by lowering barriers to music production, as it can be made accessible online for musicians, students, and educators to use.

Part B (Grace): Write on Cue aims to make music transcription more accessible to diverse musical communities, including amateur musicians, educators, composers, and students from various cultural and social backgrounds. This benefits people who may not have the technical skills or resources to manually transcribe music and allows individuals to better engage with music across a variety of cultural contexts. For example, in communities where formal music education is less accessible, our project can provide a more equitable way for musicians to preserve and share traditional flute music, irrespective of whether they are classically trained. Additionally, socially, this allows musicians from different backgrounds to contribute their musical expressions and allows for easier preservation of musical heritage. 

Part C (Deeya): Traditional methods of music transcription are time-consuming and require specialized knowledge, creating a barrier for learners who want to review their performances or for educators who need to provide detailed feedback. By streamlining the transcription process, our project reduces the dependency on costly manual transcription services, which lowers the overall cost of producing sheet music. Also, we are designing our project on a web app, which maximizes accessibility and encourages a cheaper and more widespread music education.