Grace Status Report for 3/22/25

This week was dedicated to testing the audio segmentation code that I created last week. From here, I realized that I would need to change the peak difference threshold, or how steep a change must be to be considered a new note, as well as the max time difference, how long it must take for a note to be considered a new note, change with the BPM and audio quality. After testing with a random audio sample from youtube of Ten Little Monkeys, we realized that this audio was probably not that useable, at least for early stage testing, as the eighth notes were note distinct enough for the code, and quite honestly myself manually, to identify.

In this image, using the audio recording of ten little monkeys, I have manually identified where the notes changes using the green line. You can see here that the code (in the red lines) are not correctly identifying these notes. However, the change is not that significant either (less than a 0.3 RMS difference) and doesn’t approach zero as much as other audios do, like twinkle twinkle little star. To try and fix this, I experimented with different audio amplifications – first just scaling the signal, but this resulted in the “near zero” value also getting scaled, meaning it still wouldn’t pick up the note, though it had a significant difference now. Then I tried to scale it exponentially, to keep the near zero values near zero and the other values increase, but since it is fairly quiet overall, this meant that everything was near zero and didn’t have a significant difference. This led us to first experiment with clearer cut audios before coming back to this one when the program is more advanced.

My next steps would be finding more distinct/staccato recordings of faster notes and seeing how the audio segmentation handles that and polish off my rhythm detection code.

Team Status Report for 3/15/25

This week, we made significant progress on the web app setup, audio segmentation, and pitch detection components of our project. We also received our microphone, and Professor Sullivan lended us an audio interface that we can use to record some audio.

Below is an image of what our web app currently looks like. Here, a user can upload flute audio and a recording of their background. They can also adjust the tempo of the metronome (at least for MVP, we are not performing tempo detection, and the user needs to set their tempo/metronome).

Additionally, we now have a basic implementation of audio segmentation (using RMS) working. Below is a graph showing a flute signal of Twinkle Twinkle Little Star, where the red lines mark the start of a new note as detected by our algorithm, and the blue dotted lines represent the actual note onset. Our algorithm’s detected notes were within 0.1ms of the actual note onset.

We achieved similar results with Ten Little Monkeys at regular and 2x speed, though we still need to add a way to dynamically adjust the RMS threshold based on the signal’s max amplitude, rather than using trial and error.

We also started performing pitch detection. To do so, we are using comb filtering and Fourier transforms to analyze the frequencies present in the played note. We then use the fundamental frequency to determine the corresponding note. We were able to successfully determine the MIDI notes for Twinkle Twinkle and plan to continue testing this out on more audio samples. 

We are on schedule with our progress currently. For the upcoming week, we plan to integrate all of our existing code together and test/refine the audio segmentation and pitch detection to ensure that it is more robust to various tempos, rhythms, and frequencies. We are also soliciting the SOM flutists’ availability so that we can start some initial testing the week of March 24th. Additionally, after speaking with Professor Chang last week during lab, we have decided to build in some time to add a feature in which users can edit the generated music score (i.e., move measures around, adjust notes, add notation such as trills/crescendos/etc. and more). 

Shivi’s Status Report for 3/15/25

This week, I met with Grace to test the audio segmentation algorithm she wrote. We tested it on a sample of Twinkle Twinkle Little Star, as well as Ten Little Monkeys. We found that for each of the two samples, we needed to adjust the RMS threshold to account for differences in the maximum amplitude of the signal; as a result, we realized that we will need to add some way to either standardize the amplitude of our signal or dynamically change the RMS threshold based on the signal’s amplitude. 

I also worked to integrate our preprocessing and audio segmentation code all together. Our current pipeline can be found on this GitHub (Segmentation/seg.py for note segmentation, and Pitch Detection/pitch.py for pitch detection) along with some of our past experimentation code.

Furthermore, now that we have audio segmentation, I was able to get pitch detection to work, at least on Twinkle Twinkle.  To do so, I used FFT and comb filtering to find and map the fundamental frequency to the MIDI note. I plan to test the pitch detection on more audio samples next week and work with Grace and Deeya to integrate all the stages of our project that we have implemented so far (web app and triggering the preprocessing/audio segmentation/pitch detection pipeline). 

Deeya Status Report for 3/15/25

This week I finished up the UI on our website for being able to record the flute and background separately, and I added the feature of being able to hear a playback of the audio being recorded. If a user decides to upload a recording, the ‘Start Recording’ button gets disabled, and if a user presses the ‘Start Recording’ button then the ‘Upload’ button gets disabled. Once the user starts recording they can stop the recording and then replay it or redo their recording. Also the user is able to adjust and hear the tempo of the metronome that gets played at a default of 60bpm. There are still two modifications I need to figure out 1. In the recording, the metronome can’t be heard because the recording only catches sound coming externally from the computer so I am playing around with some APIs I found that are able to catch internal and external recordings from a computer 2. I need to adjust the pitch of the metronome to a value that isn’t in the range a flute can be played at. For the Gen AI part there is a Music Transformer implementation available online using the MAESTRO dataset that focuses on piano music. I am thinking of using this instead of creating this process from scratch. I downloaded the code and tried to understand the different parts of the code. I was able to take a flute midi file and convert it into a format that the transformer can use. I want to continue learning and experimenting with this and see if I can fine tune the model on flute midi files. 

Grace Status Report for 3/15/25

This week, I got audio segmentation up and working. After our previous conversation with Professor Sullivan, I first converted the audio signal into RMS values. 

My first approach was to calculate if there was a sharp increase in the RMS. However, this caused me to incorrectly identify some spikes multiple times. Increasing the amount of time that has passed since the last identified point often caused me to miss some beginning of notes. 

(image where the dots would signify the code identifying the start of a note, as you can see, it was too much)

I then realized that before notes, oftentimes the RMS would get near zero. So my next approach was to convert my code to identify when the RMS is near zero, but then when I got in a moment of silence (like a rest) I would incorrectly want to segment the silence into many different segments, which would waste a lot of time. So I tried to do a combination of the two then where I would look for when the RMS was near zero and then look for the nearest peak. Then if this nearest peak RMS minus the starting RMS (near zero) difference was greater than a specific threshold, currently 0.08 but this is still getting experimented with, then I would identify it as a correct note. While this was the most accurate approach thus far, I still ran into a bug where even in the moments of silence it would find the nearest peak, a couple of seconds away, and identify the silence as multiple beginning of notes again. I fixed this by checking how far away the peak was and making a maximum threshold. 

(where the dotted blue line would be where the code identified the nearest peak and the red line is where the code is marking the near zero RMS values)

Currently this works for a sample of twinkle Twinkle Little Star. When testing this with a recording of ten little monkeys, it works if we lowered the RMS threshold, which signified that we would need to standardize our signal somehow in the future. We also noticed that with quicker notes, the RMS values don’t get as close to zero as quarter or half notes, so we might need to increase the threshold for what is considered near zero. 

(red line is where code has identified beginning of notes and blue dotted line is where i manually identified the beginning of notes)

Deeya’s Status Report for 3/8/25

I mainly focused on my parts for the design review document and editing it with Grace and Shivi. Shivi and I also had the opportunity to speak to flutists in Professor Almarza’s class about our project, and we were able to recruit a few of them to help us with recording samples and providing feedback throughout our project.  It was a cool experience to hear about their thoughts as well as understand how this could be helpful for them during practice sessions they have. Specifically for my parts of the project I continued working on the website and learned how to record audio and store it in our database to be used later. I will now be starting to put more of efforts in the Gen AI part.  I am thinking of utilizing a Transformer-based generative model trained on MIDI sequences and I will need to learn how to take MIDI files and convert them into a series of token encodings of musical notes, timing, and dynamics, so that it can be processed by the Transformer model. I will also start compiling a dataset of flute MIDI files.

 

Team Status Report for 3/8/25

This past week, we mostly focused on finishing up our design review by ensuring that the language was specific enough and concise. Additionally, we focused on adding in clear graphics, like pseudo graphics and diagrams to help convey the information. In addition to this, we have also met with Professor Almaraza and confirmed the use case requirements and gained their opinion on the current workflow. From this, we also got three flutists to sign up for testing for our project and will now be working on getting them additionally sign up for the conjoined mini course. In terms of the project, we have a more clear understanding of how to implement audio segmentation after some research and discussing the concept with professor Sullivan and look towards really finishing this portion up this next week. 

Overall, we are currently on track, though may run into the some issues with the audio segmentation as this will be the most difficult aspect of our project.

Part A (Grace): Our product solution will meet the global factors for those who are not as technologically savvy by making the project as easy to understand as possible. Already this project is significantly decreasing the strain of composing your own music by eliminating the need for individuals to know exact lengths of notes, pitches, etc of when they are composing music and simplifying the process by decreasing the amount of time it takes to transcribe. In addition to this, the individual will only have to upload an mp4 recording of them playing before getting transcribed music, as we will be handling all the backend aspects of this. As such, even the technologically unsavvy should be able to use this application. Furthermore, we aim to make the UI user friendly and easy to read. 

In addition, we aim to make this usable in other environments, not just an academic one, by filtering out the outside noise to allow users to be able to use the application in even noisy settings. As mentioned in our design reviews, we will be testing this application in multiple different settings to hopefully encompass the different environments this website would be used globally. 

Part B (Shivi): Our project can make a cultural impact by allowing people to pass down music that lacks standardized notation. For instance, the traditional/folk tunes (such as the Indian bansuri or Native American flute) are often played by ear and likely to be lost over time, but our project can help transcribe such performances, allowing them to be preserved over multiple generations. This would also help to increase access to music for people from different cultures, promoting cross-cultural collaboration. 

Part C (Deeya): Write on Cue is addressing environmental factors by encouraging a more sustainable system of digital transcription over printed notation – reducing paper usage. Also digital transcription allows musicians to learn, compose, and practice remotely, reducing the need for physical travel to lessons, rehearsals, or recording sessions. By reducing transportation energy and paper consumption, it helps make our product more environmentally-friendly. Also, instead of relying on large, energy-intensive AI models, we are going to use smaller, more efficient models trained specifically for flute music, which will help reduce computation time and power consumption. We will look into techniques like quantization to help speed up inference.

Grace’s Status Report for 3/8/25

Last week, I primarily worked on the design review document, refining the finer details. Additionally, the conversation we had during the design presentation was useful as we decided that the noise filtering features might be excessive, especially with the microphone being so close to where the signal will be coming from. As our microphone has just recently come in, we are excited to experiment with this process and test these in environments that flutists commonly compose in, like their personal rooms (where there might be slight background noise) and studios (virtually no noise). Hopefully with the calibration step, we can eliminate excessive noise filtering and decrease the amount of time it takes to get the sheet music to the user. Furthermore, after meeting with Professor Sullivan last week, we have a better idea of implementing audio segmentation, deciding to focus on the RMS rather than just the peaks in amplitude for note beginnings, so we plan on implementing a sliding RMS window of around 10 ms and looking for peaks there. After creating these segmentations, I plan to implement my module of rhythm detection here since we cannot just use the length of the segmentation as there might be a rest in the segmentation. Overall we are currently on track, but this week we expect to run into more issues as audio segmentation will most likely be the hardest aspect of our project.

Finally, we are excited for how our collaboration with the music department will look as many seem interested. We will be reaching out to the flutists this week to get them registered for the mini as well.  

Shivi’s Status Report for 3/8/25

Last week, I mainly focused on working on the design review document with Deeya and Grace. Incorporating the feedback we received during the design presentation, I worked mostly on the preprocessing/calibration, pitch detection, and design trade studies aspects of the design document. Additionally, Professor Dueck connected us with Professor Almarza from the School of Music, and Deeya and I met with him and the flutists from his studio. This helped us confirm our use case requirements, get their opinion on our current user workflow, and solicit their availability for testing out our pipeline in a few weeks. The flutists were excited about the project as a composition tool such as the one we are developing would greatly aid them in writing new compositions. Grace and I also discussed how to implement the audio segmentation; as of now, we are planning to apply RMS over 10 ms windows of the signal and use spikes in amplitude to determine where the new note begins. Based on our research, similar approaches have been used in open-source implementations for segmenting vocal audio by note, so we are optimistic about this approach for flute audio as well. We are currently on schedule with our progress, but I anticipate issues with audio segmentation this week, so we plan to hit the ground running for this aspect of our project on Monday so that we can have the segmentation working, at least for recordings of a few quarter notes, by the end of the week.

Team Status Report for 2/22/25

This past week we presented our Design Review slides in class and individually spent time working on our respective parts of this project. Deeya is almost done completing the basic functionality of the website and has so far completed the UI of the website, user profiles and authentication, and navigation of different pages. Grace and Shivi are planning to meet up to combine their preprocessing code and figure out the best way to handle audio segmentation. They want to make sure their approach is efficient and works well with their overall pipeline that they are each creating on their own.

This week we will focus on completing our design report, with each of us working on assigned sections independently before integrating everything into a cohesive report. We are planning to finish the report a few days before the deadline so that we can send it to Ankit and get his feedback. This Wednesday we will also be meeting with Professor Almarza from the School of Music and his flutists to explain our project and to come up with a plan on how we would like to integrate their expertise and knowledge into our project.  

Overall we each feel that we are on track with our respective parts of the project and we are excited to meet with the flutists this week. We haven’t changed much to our overall design plan and there aren’t any other new risks we are considering besides the ones we laid out in our summary report last week.