Grace Status Report for 3/15/25

This week, I got audio segmentation up and working. After our previous conversation with Professor Sullivan, I first converted the audio signal into RMS values. 

My first approach was to calculate if there was a sharp increase in the RMS. However, this caused me to incorrectly identify some spikes multiple times. Increasing the amount of time that has passed since the last identified point often caused me to miss some beginning of notes. 

(image where the dots would signify the code identifying the start of a note, as you can see, it was too much)

I then realized that before notes, oftentimes the RMS would get near zero. So my next approach was to convert my code to identify when the RMS is near zero, but then when I got in a moment of silence (like a rest) I would incorrectly want to segment the silence into many different segments, which would waste a lot of time. So I tried to do a combination of the two then where I would look for when the RMS was near zero and then look for the nearest peak. Then if this nearest peak RMS minus the starting RMS (near zero) difference was greater than a specific threshold, currently 0.08 but this is still getting experimented with, then I would identify it as a correct note. While this was the most accurate approach thus far, I still ran into a bug where even in the moments of silence it would find the nearest peak, a couple of seconds away, and identify the silence as multiple beginning of notes again. I fixed this by checking how far away the peak was and making a maximum threshold. 

(where the dotted blue line would be where the code identified the nearest peak and the red line is where the code is marking the near zero RMS values)

Currently this works for a sample of twinkle Twinkle Little Star. When testing this with a recording of ten little monkeys, it works if we lowered the RMS threshold, which signified that we would need to standardize our signal somehow in the future. We also noticed that with quicker notes, the RMS values don’t get as close to zero as quarter or half notes, so we might need to increase the threshold for what is considered near zero. 

(red line is where code has identified beginning of notes and blue dotted line is where i manually identified the beginning of notes)

Deeya’s Status Report for 3/8/25

I mainly focused on my parts for the design review document and editing it with Grace and Shivi. Shivi and I also had the opportunity to speak to flutists in Professor Almarza’s class about our project, and we were able to recruit a few of them to help us with recording samples and providing feedback throughout our project.  It was a cool experience to hear about their thoughts as well as understand how this could be helpful for them during practice sessions they have. Specifically for my parts of the project I continued working on the website and learned how to record audio and store it in our database to be used later. I will now be starting to put more of efforts in the Gen AI part.  I am thinking of utilizing a Transformer-based generative model trained on MIDI sequences and I will need to learn how to take MIDI files and convert them into a series of token encodings of musical notes, timing, and dynamics, so that it can be processed by the Transformer model. I will also start compiling a dataset of flute MIDI files.

 

Team Status Report for 3/8/25

This past week, we mostly focused on finishing up our design review by ensuring that the language was specific enough and concise. Additionally, we focused on adding in clear graphics, like pseudo graphics and diagrams to help convey the information. In addition to this, we have also met with Professor Almaraza and confirmed the use case requirements and gained their opinion on the current workflow. From this, we also got three flutists to sign up for testing for our project and will now be working on getting them additionally sign up for the conjoined mini course. In terms of the project, we have a more clear understanding of how to implement audio segmentation after some research and discussing the concept with professor Sullivan and look towards really finishing this portion up this next week. 

Overall, we are currently on track, though may run into the some issues with the audio segmentation as this will be the most difficult aspect of our project.

Part A (Grace): Our product solution will meet the global factors for those who are not as technologically savvy by making the project as easy to understand as possible. Already this project is significantly decreasing the strain of composing your own music by eliminating the need for individuals to know exact lengths of notes, pitches, etc of when they are composing music and simplifying the process by decreasing the amount of time it takes to transcribe. In addition to this, the individual will only have to upload an mp4 recording of them playing before getting transcribed music, as we will be handling all the backend aspects of this. As such, even the technologically unsavvy should be able to use this application. Furthermore, we aim to make the UI user friendly and easy to read. 

In addition, we aim to make this usable in other environments, not just an academic one, by filtering out the outside noise to allow users to be able to use the application in even noisy settings. As mentioned in our design reviews, we will be testing this application in multiple different settings to hopefully encompass the different environments this website would be used globally. 

Part B (Shivi): Our project can make a cultural impact by allowing people to pass down music that lacks standardized notation. For instance, the traditional/folk tunes (such as the Indian bansuri or Native American flute) are often played by ear and likely to be lost over time, but our project can help transcribe such performances, allowing them to be preserved over multiple generations. This would also help to increase access to music for people from different cultures, promoting cross-cultural collaboration. 

Part C (Deeya): Write on Cue is addressing environmental factors by encouraging a more sustainable system of digital transcription over printed notation – reducing paper usage. Also digital transcription allows musicians to learn, compose, and practice remotely, reducing the need for physical travel to lessons, rehearsals, or recording sessions. By reducing transportation energy and paper consumption, it helps make our product more environmentally-friendly. Also, instead of relying on large, energy-intensive AI models, we are going to use smaller, more efficient models trained specifically for flute music, which will help reduce computation time and power consumption. We will look into techniques like quantization to help speed up inference.

Grace’s Status Report for 3/8/25

Last week, I primarily worked on the design review document, refining the finer details. Additionally, the conversation we had during the design presentation was useful as we decided that the noise filtering features might be excessive, especially with the microphone being so close to where the signal will be coming from. As our microphone has just recently come in, we are excited to experiment with this process and test these in environments that flutists commonly compose in, like their personal rooms (where there might be slight background noise) and studios (virtually no noise). Hopefully with the calibration step, we can eliminate excessive noise filtering and decrease the amount of time it takes to get the sheet music to the user. Furthermore, after meeting with Professor Sullivan last week, we have a better idea of implementing audio segmentation, deciding to focus on the RMS rather than just the peaks in amplitude for note beginnings, so we plan on implementing a sliding RMS window of around 10 ms and looking for peaks there. After creating these segmentations, I plan to implement my module of rhythm detection here since we cannot just use the length of the segmentation as there might be a rest in the segmentation. Overall we are currently on track, but this week we expect to run into more issues as audio segmentation will most likely be the hardest aspect of our project.

Finally, we are excited for how our collaboration with the music department will look as many seem interested. We will be reaching out to the flutists this week to get them registered for the mini as well.  

Shivi’s Status Report for 3/8/25

Last week, I mainly focused on working on the design review document with Deeya and Grace. Incorporating the feedback we received during the design presentation, I worked mostly on the preprocessing/calibration, pitch detection, and design trade studies aspects of the design document. Additionally, Professor Dueck connected us with Professor Almarza from the School of Music, and Deeya and I met with him and the flutists from his studio. This helped us confirm our use case requirements, get their opinion on our current user workflow, and solicit their availability for testing out our pipeline in a few weeks. The flutists were excited about the project as a composition tool such as the one we are developing would greatly aid them in writing new compositions. Grace and I also discussed how to implement the audio segmentation; as of now, we are planning to apply RMS over 10 ms windows of the signal and use spikes in amplitude to determine where the new note begins. Based on our research, similar approaches have been used in open-source implementations for segmenting vocal audio by note, so we are optimistic about this approach for flute audio as well. We are currently on schedule with our progress, but I anticipate issues with audio segmentation this week, so we plan to hit the ground running for this aspect of our project on Monday so that we can have the segmentation working, at least for recordings of a few quarter notes, by the end of the week.

Team Status Report for 2/22/25

This past week we presented our Design Review slides in class and individually spent time working on our respective parts of this project. Deeya is almost done completing the basic functionality of the website and has so far completed the UI of the website, user profiles and authentication, and navigation of different pages. Grace and Shivi are planning to meet up to combine their preprocessing code and figure out the best way to handle audio segmentation. They want to make sure their approach is efficient and works well with their overall pipeline that they are each creating on their own.

This week we will focus on completing our design report, with each of us working on assigned sections independently before integrating everything into a cohesive report. We are planning to finish the report a few days before the deadline so that we can send it to Ankit and get his feedback. This Wednesday we will also be meeting with Professor Almarza from the School of Music and his flutists to explain our project and to come up with a plan on how we would like to integrate their expertise and knowledge into our project.  

Overall we each feel that we are on track with our respective parts of the project and we are excited to meet with the flutists this week. We haven’t changed much to our overall design plan and there aren’t any other new risks we are considering besides the ones we laid out in our summary report last week. 

Deeya’s Status Report for 02/22/2025

This week I finished setting up the user authentication process for our website so that each user will have an associated profile to their account. This will help keep track of what transcriptions belong to which user and which transcription to upload in their respective Past Transcriptions page. I also started looking into how to record live audio through the website and store that in our database so that it can be used by the pitch and rhythm algorithm being designed by Grace and Shivi. Overall I am on track with the website and should be done with its overall functionality this week. One thing I still want to figure out is how to take what is most recently stored in our database of either the uploaded or live recorded audio files and automatically put that through the pitch and rhythm algorithms so that when it is time to integrate the process should be smooth. For the Gen AI portion of the project it looks like I might need to create a labelled dataset myself which I will have time to focus on once I finish up the website this week. Also for this week I will be working on my portions of the design review report.

Shivi’s Status Report for 02/22/2025

This week, I spent most of my time working on the design review presentation and design review document. I also thought more about our current noise suppression method, for which we are using a Butterworth filter, spectral subtraction, and adaptive noise filtering. However, based on Professor Sullivan’s advice and my own experimentation with hyperparameters and various notes, the latter two methods do not make a significant improvement in the resulting signal. To avoid any redundancy and inefficiencies, I removed the spectral subtraction and adaptive noise filtering for now. Additionally, I looked more into how we can perform audio segmentation to make it easier to detect pitch and rhythm and found that we may be able to detect note onsets by examining , though this might not work for different volumes without some form of normalization. I will be working with Grace this week to combine our noise suppression and amplitude thresholding code, and more importantly, to work on implementing the note segmentation. Some of the risks with audio segmentation are as follows: noise (so we may need to go back and adjust noise suppression/filtering based on our segmentation results), detecting unintentional extra notes in the transition from one note to another (can be mitigated by setting a rule that consecutive notes must be, say, 100ms apart), and variations in volume (will be mitigated by Grace’s script for applying dynamic thresholding and normalizing the volume).  This week, we are also visiting with Professor Almarza from the School of Music to solicit flutists to test our transcription pipeline within the next few weeks.

We are currently on schedule, but we might need to build in extra time for the note segmentation, as detecting note onset and offset is one of the most challenging parts of the project. 

Grace’s Status Report For 2/22/25

This week, I mostly focused on working on our design review presentation and our design proposal. Since this week I was the one presenting, I focused on making sure our slides were ready and rehearsed, and ensured that I could present all the information from my team mate’s slides as well as limiting words to make our presentation less word heavy. In class, we listened to the different presentations and gave feedback. This was helpful since professor Sullivan mentioned that some of our filtering for noise suppression might be excessive, which would slow down our processing time and lengthen our latency, so we will continue with experimenting using just the bandpass and see if the other filtering methods are necessary. I further experimented with using different thresholds and filters to better isolate the rhythm of single notes. 

For this week, I will be working on my sections of the design review paper as well as doing further research on audio segmentation because depending on how this is working will determine how I will implement rhythm detection. I will be meeting with Shivi to work on this aspect. Our project is currently on schedule but other tasks might have to be pushed back with the introduction of audio segmentation. I hope to get this (audio segmentation + rhythm detection) working by the week after spring break at the latest.

Deeya’s Status Report for 2/15/25

This week, I made progress on our project’s website by setting up a Django application that closely follows the UI design from our mockup using HTML and CSS. I am finishing up implementing the user authentication process using OAuth, which will allow users to easily register and log in with their email addresses. User profile information is being stored in an SQL database. I am currently on track with the website development timeline and will be focusing next on being able to upload files and storing them in our database. Also I will begin working on the “Past Transcriptions” web page, which will show the user’s transcription history along with the dates each transcription was created.

Regarding the generative AI component of the project, I am still searching for a large enough labeled dataset for training our model. I found the MAESTRO dataset of piano MIDI files, which would be ideal if a similar dataset existed for the flute. If I am unable to find a large labeled dataset within the next few days, I am planning on creating a small dataset myself as a starting point. This will allow me to start experimenting with model training and fine-tuning while continuing to look for a better dataset.