Team Status Report for 4/26/25

This week, we as a collective worked on creating a way to store past transcriptions using SQLite on our website as well as let people add, edit, and change anything from the original transcription that was generated. We believed that this followed the ebbs and flows of composition better and we wanted to mimic that. In addition to that, we are now focusing on fine tuning and testing our program further as well as working on some of the final deliverables, like the presentation and the poster.

Unit Tests:
Rhythm Detection/Audio Segmentation: Testing this on different BPM compositions of Twinkle Twinkle Little Star, using songs with tied notes like Ten Little Monkeys and compositions from the school of music, and compositions with rests in them like Hot Cross Buns and additional songs from the school of music.

Overall System Tests: We tested this project on varying difficulty of songs such as easy songs like nursery rhymes (hot cross buns, ten little monkeys, twinkle twinkle little star, etc) and scales, intermediate difficult songs from youtube as well as our team members playing (Telemann – 6 Sonatas for two flutes Op. 2 – no. 2 in E minor TWV 40:102- I. Largo, Mozart – Sonata No. 8 in F major, K. 13 – Minuetto I and II, etc) and more difficult sections like composition from the school of music. 

Findings and Design Changes: From these varying songs, we realized that our program struggled more with higher octaves as the filter would accidentally cut off those frequencies, slurred notes (as it wouldn’t see it as the onset of a new note), and rests (especially when trying to differentiate moments of taking a breath from actual rests). These led us to tweaking how we defined a new note to create a segmentation and the boundaries for our filter.

Data Obtained:

Latency Rhythm Accuracy Pitch Accuracy
Scale 1: F Major Scale 10.08 secs 100% 100%
Scale 2: F Major Scale W/ Ties 12.57 secs 97% 95%
Simple 1: Full Twinkle Twinkle Little Star 13.09 secs 100% 100%
Simple 2: Ten Little Monkeys 14.46 secs 93% 100%
Simple 3: Hot Cross Buns 11.78 secs 91% 100%

 

 

Latency Rhythm Accuracy Pitch Accuracy
Intermediate 1: Telemann – 6 Sonatas for two flutes Op. 2 – no. 2 in E minor TWV 40:102 15.16 secs 90% 100%
Intermediate 2: I. Largo, Mozart – Sonata No. 8 in F major, K. 13 – Minuetto I and II 16.39 secs 87% 97%
Hard 1: Phoebe SOM Composition 10.99 secs 91.5% 100%
Hard 2: Olivia SOM Composition 12.11 secs 93% 100%

Grace’s Status Report for 4/26/25

This week I worked on fine tuning identifying the rests in the segments as well as begun working on things needed for the final demo and presentation. So I worked on my portion of the final presentation, did some additional testing on the segment with more complex compositions, like songs from the school of music students, and begun filling out the final poster.

I additionally tried experimenting with other forms of identifying rests, like using librosa’s package for identifying segments as well as looking into the onset detection algorithms other teams in my section did, but ultimately realized that I would not have enough time to fully realize these and flesh them out, so continued finetuning my originally algorithm to better results.

While creating the audio segmentation for tied notes, we realized that false positives would be better as individuals can go in and delete the note rather than having to add in the note. Fine tuning this further, our algorithm looks a lot better, especially for compositions like hot cross buns. 

Grace’s Status Report for 4/19/25

This week has been continuing to fine tune the algorithm for slurred notes and rests. We made the switch last week to STE from RMS and we saw better results as it kept the near zero values closer to zero, however with slurred notes the values still do not get close enough.

I created an algorithm that would look through the segments from the original algorithm and if it has any times, then it would “flag” the segment as a possible slurred note. Then I would check for pitch changes. I first used spectrogram but found that iw would miss some notes or incorrectly identify where the notes actually change, so I switched to using STFT and having a ratio varying based on BPM to detect smaller note changes with faster tempos with better success. Here is a picture of using the spectrogram

and using the STFT

While there are still some inaccuracies, it is much better than before. Currently working on fine tuning the rest detections as right now it has a tendency to over detect rests. Also looking into using CNNs for classification of slurred notes

While implementing this project, I learned more about signal processing and how some things are much easier to identify manually/visually than coding it up. Additionally, reading up on how much people research into identifying segmentations in music and how different types of instrument can add to more slurs as they tend to be more legato. For this project, I needed new tools on identifying new notes like using STFT and STE. To learn more about these, I would read research papers from other projects and universities on how they approach it and tried to combine aspects of them to get a better working algorithm.

Grace’s Status Report for 4/12/2025

Worked on the issues mentioned during interim demo, which were not being able to accurately detect slurs and not picking up the rests in songs.

First, experimented with modifying the code to Short Time Energy (STE). This helped the code become more “clear” as there were less bumps and more clear near zero values, essentially eliminating some of the noise that stayed with RMS. Should make amplifying the signal a lot easier now. However, still having some difficulty seeing the differences in slurred notes, so doing some additional research in onset detection to detect slurred notes specifically rather than look for the separation of notes in segmentation.

 (forgot to change label for the line, but this is a graph for STE – this was taken using audio from a student in the school of music)

For rests, modified my rhythm detection algorithm to instead look for zero values after reaching the peak (means the note is done playing) and taking the additional length after the note to count as a rest. Sometimes takes slight moments of silence to distinguish notes as a rest though so need to do some experiments to make it less sensitive.

Team Status Report for 3/29/25

This week we are working on integrating the basic functions of the code together. First Shivi and Grace will be combing the rhythm and pitch code together into a main.py, which will then be pulled as the backend for Deeya’s web app code. Then we’ll be integrating it with an API to generate the sheet code UI.

While integrating, we figured out that we need to modify audio segmentation a bit to account for the periods of rest time better. We will be looking into using concave changes in the audio to actually compute the start of a new note. Additionally, we will need to experiment with encoding into the midi file as currently the rhythm detection is outputting the type of note (quarter, half, etc) but the midi encoding will take in the start and end times of samples. We will be further testing if we can use actual start and end times and the midi api will round itself to be the current notes or if that is something we will be doing manually in the future.

For this week, we will be working on having a working basic demo for interim demos and then working on having the ability to change the sheet music within the API (manually adding and deleting notes, etc)

Grace’s Status Report for 3/29/25

This week I worked on finishing the rhythm detection. I approached this by using the audio segmentation code that I created last week. I looped through the segments and then in each segment, we use the BPM (which will come from the web app once it is integrated) to calculate the length of the notes and then using an if statement to classify it as either a sixteenth, eighth, quarter, etc note. This seems to be working with the Twinkle Twinkle Little Star audio. May be a little buggy with how I am calculating rests as I am just calculating the remaining portion of the segment so will need to figure out a better algorithm for this and test it further. Will be looking into using Regions of interest/energy to do audio segmentation for audios with less steep increases in amplitude (slurred note) for more precise audio segmentation. Currently on schedule – working on interim demo presentation and integrating shivi, deeya, and i’s part.

Grace Status Report for 3/22/25

This week was dedicated to testing the audio segmentation code that I created last week. From here, I realized that I would need to change the peak difference threshold, or how steep a change must be to be considered a new note, as well as the max time difference, how long it must take for a note to be considered a new note, change with the BPM and audio quality. After testing with a random audio sample from youtube of Ten Little Monkeys, we realized that this audio was probably not that useable, at least for early stage testing, as the eighth notes were note distinct enough for the code, and quite honestly myself manually, to identify.

In this image, using the audio recording of ten little monkeys, I have manually identified where the notes changes using the green line. You can see here that the code (in the red lines) are not correctly identifying these notes. However, the change is not that significant either (less than a 0.3 RMS difference) and doesn’t approach zero as much as other audios do, like twinkle twinkle little star. To try and fix this, I experimented with different audio amplifications – first just scaling the signal, but this resulted in the “near zero” value also getting scaled, meaning it still wouldn’t pick up the note, though it had a significant difference now. Then I tried to scale it exponentially, to keep the near zero values near zero and the other values increase, but since it is fairly quiet overall, this meant that everything was near zero and didn’t have a significant difference. This led us to first experiment with clearer cut audios before coming back to this one when the program is more advanced.

My next steps would be finding more distinct/staccato recordings of faster notes and seeing how the audio segmentation handles that and polish off my rhythm detection code.

Grace Status Report for 3/15/25

This week, I got audio segmentation up and working. After our previous conversation with Professor Sullivan, I first converted the audio signal into RMS values. 

My first approach was to calculate if there was a sharp increase in the RMS. However, this caused me to incorrectly identify some spikes multiple times. Increasing the amount of time that has passed since the last identified point often caused me to miss some beginning of notes. 

(image where the dots would signify the code identifying the start of a note, as you can see, it was too much)

I then realized that before notes, oftentimes the RMS would get near zero. So my next approach was to convert my code to identify when the RMS is near zero, but then when I got in a moment of silence (like a rest) I would incorrectly want to segment the silence into many different segments, which would waste a lot of time. So I tried to do a combination of the two then where I would look for when the RMS was near zero and then look for the nearest peak. Then if this nearest peak RMS minus the starting RMS (near zero) difference was greater than a specific threshold, currently 0.08 but this is still getting experimented with, then I would identify it as a correct note. While this was the most accurate approach thus far, I still ran into a bug where even in the moments of silence it would find the nearest peak, a couple of seconds away, and identify the silence as multiple beginning of notes again. I fixed this by checking how far away the peak was and making a maximum threshold. 

(where the dotted blue line would be where the code identified the nearest peak and the red line is where the code is marking the near zero RMS values)

Currently this works for a sample of twinkle Twinkle Little Star. When testing this with a recording of ten little monkeys, it works if we lowered the RMS threshold, which signified that we would need to standardize our signal somehow in the future. We also noticed that with quicker notes, the RMS values don’t get as close to zero as quarter or half notes, so we might need to increase the threshold for what is considered near zero. 

(red line is where code has identified beginning of notes and blue dotted line is where i manually identified the beginning of notes)

Team Status Report for 3/8/25

This past week, we mostly focused on finishing up our design review by ensuring that the language was specific enough and concise. Additionally, we focused on adding in clear graphics, like pseudo graphics and diagrams to help convey the information. In addition to this, we have also met with Professor Almaraza and confirmed the use case requirements and gained their opinion on the current workflow. From this, we also got three flutists to sign up for testing for our project and will now be working on getting them additionally sign up for the conjoined mini course. In terms of the project, we have a more clear understanding of how to implement audio segmentation after some research and discussing the concept with professor Sullivan and look towards really finishing this portion up this next week. 

Overall, we are currently on track, though may run into the some issues with the audio segmentation as this will be the most difficult aspect of our project.

Part A (Grace): Our product solution will meet the global factors for those who are not as technologically savvy by making the project as easy to understand as possible. Already this project is significantly decreasing the strain of composing your own music by eliminating the need for individuals to know exact lengths of notes, pitches, etc of when they are composing music and simplifying the process by decreasing the amount of time it takes to transcribe. In addition to this, the individual will only have to upload an mp4 recording of them playing before getting transcribed music, as we will be handling all the backend aspects of this. As such, even the technologically unsavvy should be able to use this application. Furthermore, we aim to make the UI user friendly and easy to read. 

In addition, we aim to make this usable in other environments, not just an academic one, by filtering out the outside noise to allow users to be able to use the application in even noisy settings. As mentioned in our design reviews, we will be testing this application in multiple different settings to hopefully encompass the different environments this website would be used globally. 

Part B (Shivi): Our project can make a cultural impact by allowing people to pass down music that lacks standardized notation. For instance, the traditional/folk tunes (such as the Indian bansuri or Native American flute) are often played by ear and likely to be lost over time, but our project can help transcribe such performances, allowing them to be preserved over multiple generations. This would also help to increase access to music for people from different cultures, promoting cross-cultural collaboration. 

Part C (Deeya): Write on Cue is addressing environmental factors by encouraging a more sustainable system of digital transcription over printed notation – reducing paper usage. Also digital transcription allows musicians to learn, compose, and practice remotely, reducing the need for physical travel to lessons, rehearsals, or recording sessions. By reducing transportation energy and paper consumption, it helps make our product more environmentally-friendly. Also, instead of relying on large, energy-intensive AI models, we are going to use smaller, more efficient models trained specifically for flute music, which will help reduce computation time and power consumption. We will look into techniques like quantization to help speed up inference.