This week I finished setting up the user authentication process for our website so that each user will have an associated profile to their account. This will help keep track of what transcriptions belong to which user and which transcription to upload in their respective Past Transcriptions page. I also started looking into how to record live audio through the website and store that in our database so that it can be used by the pitch and rhythm algorithm being designed by Grace and Shivi. Overall I am on track with the website and should be done with its overall functionality this week. One thing I still want to figure out is how to take what is most recently stored in our database of either the uploaded or live recorded audio files and automatically put that through the pitch and rhythm algorithms so that when it is time to integrate the process should be smooth. For the Gen AI portion of the project it looks like I might need to create a labelled dataset myself which I will have time to focus on once I finish up the website this week. Also for this week I will be working on my portions of the design review report.
Shivi’s Status Report for 02/22/2025
This week, I spent most of my time working on the design review presentation and design review document. I also thought more about our current noise suppression method, for which we are using a Butterworth filter, spectral subtraction, and adaptive noise filtering. However, based on Professor Sullivan’s advice and my own experimentation with hyperparameters and various notes, the latter two methods do not make a significant improvement in the resulting signal. To avoid any redundancy and inefficiencies, I removed the spectral subtraction and adaptive noise filtering for now. Additionally, I looked more into how we can perform audio segmentation to make it easier to detect pitch and rhythm and found that we may be able to detect note onsets by examining , though this might not work for different volumes without some form of normalization. I will be working with Grace this week to combine our noise suppression and amplitude thresholding code, and more importantly, to work on implementing the note segmentation. Some of the risks with audio segmentation are as follows: noise (so we may need to go back and adjust noise suppression/filtering based on our segmentation results), detecting unintentional extra notes in the transition from one note to another (can be mitigated by setting a rule that consecutive notes must be, say, 100ms apart), and variations in volume (will be mitigated by Grace’s script for applying dynamic thresholding and normalizing the volume). This week, we are also visiting with Professor Almarza from the School of Music to solicit flutists to test our transcription pipeline within the next few weeks.
We are currently on schedule, but we might need to build in extra time for the note segmentation, as detecting note onset and offset is one of the most challenging parts of the project.
Grace’s Status Report For 2/22/25
This week, I mostly focused on working on our design review presentation and our design proposal. Since this week I was the one presenting, I focused on making sure our slides were ready and rehearsed, and ensured that I could present all the information from my team mate’s slides as well as limiting words to make our presentation less word heavy. In class, we listened to the different presentations and gave feedback. This was helpful since professor Sullivan mentioned that some of our filtering for noise suppression might be excessive, which would slow down our processing time and lengthen our latency, so we will continue with experimenting using just the bandpass and see if the other filtering methods are necessary. I further experimented with using different thresholds and filters to better isolate the rhythm of single notes.
For this week, I will be working on my sections of the design review paper as well as doing further research on audio segmentation because depending on how this is working will determine how I will implement rhythm detection. I will be meeting with Shivi to work on this aspect. Our project is currently on schedule but other tasks might have to be pushed back with the introduction of audio segmentation. I hope to get this (audio segmentation + rhythm detection) working by the week after spring break at the latest.
Deeya’s Status Report for 2/15/25
This week, I made progress on our project’s website by setting up a Django application that closely follows the UI design from our mockup using HTML and CSS. I am finishing up implementing the user authentication process using OAuth, which will allow users to easily register and log in with their email addresses. User profile information is being stored in an SQL database. I am currently on track with the website development timeline and will be focusing next on being able to upload files and storing them in our database. Also I will begin working on the “Past Transcriptions” web page, which will show the user’s transcription history along with the dates each transcription was created.
Regarding the generative AI component of the project, I am still searching for a large enough labeled dataset for training our model. I found the MAESTRO dataset of piano MIDI files, which would be ideal if a similar dataset existed for the flute. If I am unable to find a large labeled dataset within the next few days, I am planning on creating a small dataset myself as a starting point. This will allow me to start experimenting with model training and fine-tuning while continuing to look for a better dataset.
Shivi’s Status Report for 2/15/25
This week, I worked on setting up our codebase/environment, running experiments with various noise suppression/filtering methods implementation details for our final design.
With regards to noise suppression, Grace and I recorded some clear and noisy flute audio. I then experimented with some filtering/noise reduction techniques as well as the open-source Demucs deep learning model by Facebook, which separates music tracks.
In my experiments, I used a combination of Butterworth bandpass filtering, adaptive noise reduction, and spectral gating. The Butterworth bandpass filter was applied first to ensure that only frequencies within the frequency range of the flute (261.6-2093.0 Hz) were captured in the audio. Then, I used spectral gating, which first estimates the noise profile from a specific part of the audio and subtracts it from the audio signal.
denoised = magnitude – reduction_factor * noise_stft
Currently, my script estimates the first second of audio to be noise, but this is not entirely accurate/true for all cases. This is why we introduced a calibration step into our pipeline, so that we can get a more accurate estimate of the noise in a particular environment as well as ensure that the user is playing loud enough (as the signal can always be reduced later).
Then, the signal undergoes adaptive noise reduction to account for unpredictable fluctuations in background noise. I also experimented with various parameters such as prop_decrease (value between 0-1 that determines the degree of noise suppression), finding that 0.5 produced the best result. Below is a graph comparing the original and denoised signal:
Though this noise suppression module did eliminate much of the noise in the original signal, you could still hear some muffled speaking in the background, though this didn’t seem to interfere much with the detection of harmonics in the D note that was being played. My experimental code for the noise suppression and Fast Fourier Transform for harmonic detection is linked in the following repo.
The second approach I tried was using Demucs, a deep learning model open-sourced by Facebook to perform music track separation. However, since it is used mainly to separate vocals and percussion, it did a great job at filtering out everything except the metronome noise, as opposed to keeping only the flute noise.
Given these results, I think the best route is to experiment more with a calibration step that allows the pipeline to take in both a noise signal and a flute+noise signal to be able to perform spectral gating more effectively. My current progress is on schedule. Next week, my plan is to run more experiments with the calibration and work with Grace to figure out the best way to segment the audio before performing the rhythm/pitch detection.
Team Status Report for 2/15/25
This week, we focused on working on our high-level design for our design presentation. After discussing with Ankit and Professor Sullivan, we placed an order for some hardware to begin working with: Behringer CB 100 Gooseneck Condenser Instrument Microphone, and an XLR to USB-C Adapter. This will allow us to improve our current experiments, as we will be able to obtain clearer audio recordings. Based on our discussions this past week, we also decided to move our entire implementation into software. Additionally, we determined that it would be best for us to provide users with a metronome (which will be a sound outside the frequency range of the flute so that it can be filtered out later) set to a default 60 BPM, which the user will be able to adjust in real-time using a slider on our web app. Previously, we had recorded single notes from the B flat major scale to experiment with harmonics, but this week we met up to also record some noisy signals to experiment with noise reduction and work on encoding information for single notes into a MIDI file and uploading it to Musescore to see if we could translate it into sheet music (see individual team member reports). After a lot of discussion, we also concluded that a real-time transcription is not relevant for our use case, since a user would only need to see the transcribed output once they are done playing.
Our new pipeline will work as follows:
- User logs into their account.
- User calibration: Ensure the user is playing at some minimum threshold before they upload a recording.
- User is prompted to record and upload an audio file of their background noise.
- User is prompted to record flute audio. (Consideration: have a play/pause recording in case the user needs to pause in between the recording?) To do so, they turn on the metronome on the web app. Metronome is set to 60 BPM by default, but they can adjust it in real-time using a slider.
- The audio is saved in the website’s database, and the pitch/rhythm detection pipeline is triggered in the backend.
- Noise suppression via Butterworth filter and adaptive noise filtering.
- Audio segmentation: Spectral Flux (how much the spectrum changes over time) and Short-Time Energy (STE) (detect sudden amplitude increases) to determine onset of a note
- For each segment (we can parallelize this with threading so multiple segments can be processed at once):
- Use note length to determine its type (eighth, quarter, half, whole, etc)
- Use FFT to determine frequency/pitch and classify which note it is
- Encode the info from all the segments into a MIDI file
- MIDI file gets uploaded to the web database and MuseScore API converts MIDI into sheet music
- Newly generated file is stored along with that user’s previous transcriptions that they can view
- IF time remains: we can add an editing feature where the user can adjust transcribed notes and add additional notation like crescendos, etc.
The biggest risk/challenge as of now is verifying if the methods that we are planning to use for noise suppression, pitch/rhythm detection will work or not. For instance, based on this week’s experiments with noise suppression, we experimented with a variety of filters (see Shivi’s status report), but found that many times, the flute audio would also get suppressed or that the background noise would not be suppressed enough. We would like to run more experiments, and our contingency for this is a calibration step that gives us a noise sample that we can then subtract from the flute audio signal. Similarly, note onset detection will probably be quite challenging as well, because it may be difficult to determine the exact moment a note ends. This is why we are deciding to segment the audio as our initial processing step, and then “round” the duration of each segment to the nearest eighth of a beat based on the BPM.
Despite these challenges, we are on-track with our schedule; over the next week, we plan to have an even more detailed design while simultaneously working on setting up our web app, experimenting more with signal calibration/noise suppression, and starting on audio segmentation.
Week-specific status report questions:
Part A (Shivi): Our flute transcription system enhances public health by supporting creative expression through music learning. With an attachable microphone and a software-only pipeline, it is affordable and safe to use. Our system also promotes welfare by lowering barriers to music production, as it can be made accessible online for musicians, students, and educators to use.
Part B (Grace): Write on Cue aims to make music transcription more accessible to diverse musical communities, including amateur musicians, educators, composers, and students from various cultural and social backgrounds. This benefits people who may not have the technical skills or resources to manually transcribe music and allows individuals to better engage with music across a variety of cultural contexts. For example, in communities where formal music education is less accessible, our project can provide a more equitable way for musicians to preserve and share traditional flute music, irrespective of whether they are classically trained. Additionally, socially, this allows musicians from different backgrounds to contribute their musical expressions and allows for easier preservation of musical heritage.
Part C (Deeya): Traditional methods of music transcription are time-consuming and require specialized knowledge, creating a barrier for learners who want to review their performances or for educators who need to provide detailed feedback. By streamlining the transcription process, our project reduces the dependency on costly manual transcription services, which lowers the overall cost of producing sheet music. Also, we are designing our project on a web app, which maximizes accessibility and encourages a cheaper and more widespread music education.
Grace’s Status Report for 2/15/25
This week, I worked on creating the rhythm detection algorithm. We first practiced by simply writing into a midi file, using the mido library in python, and then uploading the output into musescore so we could see what the sheet music generation looked like.
We are trying to get the bare bones aspect of the project working, so we did a few different recordings, including the metronome alone, someone playing a D on the flute with the metronome in the background and no other sound, and then someone playing a D on the flute with some background noise (people talking). This helps us test with just detecting a note with the clear recording, but also experiment with noise suppression, which Shivi is working on.
(what the isolated signals for the metronome and the flute note look like)
After analyzing what these frequencies look like after doing a fourier transform on them, I isolated the signal by using a filter to filter out all other frequencies than the pitch of the note played and calculated the duration of the notes, using the inputted bpm. However, with audio recordings, there tends to be a lot of variation in the sound quality, creating a lot of peaks within the wave. This originally made my code think there were multiple different notes being played since I was trying to calculate it by the peaks. After analyzing the signal further, I migrated to using 20% of the max amplitude to use as a threshold to calculate the duration of a note. I then transcribed this into a midi file and uploaded it to musescore to look at the results. Though it is still not accurate for the rhythm, I am hopeful that this will be working soon and plan on using a sliding window filter in future testing to reduce the number of peaks and noise.
(what is currently being transcribed to musescore, should just be a signal note so will need to reassess my threshold value)
My current progress is on schedule. This next week I hope to get the rhythm detection accurately working for a whole note, quarter note, and a half note for the same note at the very least. Hopefully, I will be able to detect switches in notes soon as well.
Deeya’s Status Report for 02/08/25
- I am tasked with working on the web/mobile application part of our project as well as with implementing the Gen AI aspect of our project
- We first were trying to assess whether a web application or a mobile app would work better for our project and its use cases. We decided to use a web app instead because it is easier to access, upload and store files, and authenticate users, and we overall have more experience working with Python, Javascript, HTML, CSS than with Swift for iOS apps.
- I designed a very basic UI for the website and will be starting a Django project that has the UI and basic functionality like being able to upload/save files to a database and has user profiles to allow users to login and out.
- For the Gen AI component the first step is to find a large enough dataset of flute music of different genres. I spoke with Professor Dueck to ask her if there was CMU archival of flute music or any resources that she recommends to look through. She recommended looking at classicalarchives.com and specifically for solo or duet flute sonatas or anything unaccompanied. Looking through this website there are a lot of flute compositions that can be useful for this project. However I still need to figure out what would be the best way to compile together a large dataset and categorize/label each piece based on its genre, tone, pace. This will be a time consuming process so I will still continue researching for more flute labelled datasets.
Shivi’s Status Report for 02/08/25
This week, I worked with Grace and Deeya to finish our proposal slides, where we included a very high-level workflow of our design. Completing the proposal slides gave us a better idea of the amount of work we need to do, and the three of us met up to generate some flute audio recordings. Since I am tasked with pitch detection, as an experiment, I wrote a basic Python script that performs a fast Fourier Transform on singular notes so that we could examine the frequencies associated with a few notes from the B flat major scale:
Here, we can see the fundamental frequencies/harmonics associated with each note, a property that we will leverage to determine which note is being played in the audio. After proposal presentations, we thought about some feedback from our TA (Ankit) and realized that we need to think more about software-hardware tradeoffs in our design. Initially, we were keen on having a hardware component in our project (having taken/taking 18341 and 18349 as well as seeing similar projects from the past doing this), but it seems that it may be cleaner/more efficient to simply perform certain tasks purely in software. For instance, our initial design included performing FFT using the microcontroller, but it will definitely be more efficient to perform it on a laptop CPU. These are some of my thoughts for a revised design (at least on the signal processing side) based on some independent research:
- Signal Processing
- Use microphone to capture flute audio
- Suggested mic: InvenSense ICS-43434, a MEMS microphone with digital output. Can be mounted close to the flute’s embouchure hole and does not require any sort of PCB soldering. We also have the option to 3D print a custom clip to attach it to the flute for optimal placement.
- Send audio to microcontroller via I2S (Inter-IC sound interface)
- Microcontroller converts PDM (Pulse Density Modulation) to PCM (Pulse Code Modulation). Some suggested microcontrollers with built-in PDM support: RPi RP2040, STM32 (more suited for high-end tasks and higher performance so might not be necessary)
- In software, do pitch detection:
- Apply additional digital filtering to the PCM signal: noise suppression, bandpass filtering, adaptive filtering
- Apply Fast Fourier Transform to detect flute frequencies, map frequencies to flute notes
- Use moving average filter (ex: Kalman filter) to smooth out pitch detection
- In software, do note length detection:
- Use Peak Tracking in Frequency Domain (more computationally expensive than methods like time-domain envelope filtering and requires harmonic filtering to avoid detecting overtones, but less sensitive to volume variations and more accurate in noisy environments)
- Detect note length: note is considered ongoing if the peak frequency remains stable. If the peak disappears or shifts significantly, the note has ended.
- MIDI: store the note frequencies, durations in a MIDI format. Then, generate a MIDI Note On message when note starts (0x90 message), MIDI Note Off message when note ends (0x80). Use duration to check note type (eighth, quarter, half, whole note, etc)
- Use MuseScore API to upload MIDI file and display sheet music on web app
- Use microphone to capture flute audio
For the coming week, we plan to flesh the design out more and work on our low-level design with other important details such as BPM detection, metronome, and integration with webapp. We also aim to make a list of any inventory/purchase items we will need.
Team Status Report for 02/08/25
As we had proposal presentations this week, we worked hard on finishing up our slides, ensuring that they were done far enough in advance that Ankit, our TA, would be able to give us feedback on our schedule. Here, Ankit had mentioned the possibility of converting our hardware systems (like the microcontroller of an arduino) be done solely in software instead, as it would function a lot faster. We are currently considering this option: Since we would ideally like to convert this system into real time, it would be best for faster processing. However, this could result in changes on how we approach tasks, like rhythm detection. We are planning on reaching out to Ankit again to talk this over further.
Last week, we also meet with Professor Dueck and other musicians to discuss what our project looks like and how the music department could contribute to our project, such as allowing us to work in her studio to test the flutes in a relatively noiseless environment, which would be best for a bare bones working project. Additionally, she connected us with Professor Almarza, who will be helping us find some flutists to help test our project.
After this, we experimented with looking at some straight tone flute signals and seeing how this pitch would appear in Matlab. This is to get more insight in getting a bare bones project up and working.
Currently, our most significant risk would be switching the project and having unforeseen consequences and then having to backtrack to the hardware idea, which is a little more fleshed out due to past project references. These risks could be managed by discussing this further with our TA and staff, like Professor Sullivan. As such, this might pose a possible change to the existing design, specifically the system spec, to help with the speed. Overall, we feel that we are on track and excited to see where our project tasks us as well as work collaboratively with CMU musicians to get their feedback throughout the process.