Grace – Team D2: Write On Cue

April 19, 2025April 20, 2025

Grace’s Status Report for 4/19/25

This week has been continuing to fine tune the algorithm for slurred notes and rests. We made the switch last week to STE from RMS and we saw better results as it kept the near zero values closer to zero, however with slurred notes the values still do not get close enough.

I created an algorithm that would look through the segments from the original algorithm and if it has any times, then it would “flag” the segment as a possible slurred note. Then I would check for pitch changes. I first used spectrogram but found that iw would miss some notes or incorrectly identify where the notes actually change, so I switched to using STFT and having a ratio varying based on BPM to detect smaller note changes with faster tempos with better success. Here is a picture of using the spectrogram

and using the STFT

While there are still some inaccuracies, it is much better than before. Currently working on fine tuning the rest detections as right now it has a tendency to over detect rests. Also looking into using CNNs for classification of slurred notes

While implementing this project, I learned more about signal processing and how some things are much easier to identify manually/visually than coding it up. Additionally, reading up on how much people research into identifying segmentations in music and how different types of instrument can add to more slurs as they tend to be more legato. For this project, I needed new tools on identifying new notes like using STFT and STE. To learn more about these, I would read research papers from other projects and universities on how they approach it and tried to combine aspects of them to get a better working algorithm.

April 12, 2025

Grace’s Status Report for 4/12/2025

Worked on the issues mentioned during interim demo, which were not being able to accurately detect slurs and not picking up the rests in songs.

First, experimented with modifying the code to Short Time Energy (STE). This helped the code become more “clear” as there were less bumps and more clear near zero values, essentially eliminating some of the noise that stayed with RMS. Should make amplifying the signal a lot easier now. However, still having some difficulty seeing the differences in slurred notes, so doing some additional research in onset detection to detect slurred notes specifically rather than look for the separation of notes in segmentation.

(forgot to change label for the line, but this is a graph for STE – this was taken using audio from a student in the school of music)

For rests, modified my rhythm detection algorithm to instead look for zero values after reaching the peak (means the note is done playing) and taking the additional length after the note to count as a rest. Sometimes takes slight moments of silence to distinguish notes as a rest though so need to do some experiments to make it less sensitive.

March 29, 2025

Grace’s Status Report for 3/29/25

This week I worked on finishing the rhythm detection. I approached this by using the audio segmentation code that I created last week. I looped through the segments and then in each segment, we use the BPM (which will come from the web app once it is integrated) to calculate the length of the notes and then using an if statement to classify it as either a sixteenth, eighth, quarter, etc note. This seems to be working with the Twinkle Twinkle Little Star audio. May be a little buggy with how I am calculating rests as I am just calculating the remaining portion of the segment so will need to figure out a better algorithm for this and test it further. Will be looking into using Regions of interest/energy to do audio segmentation for audios with less steep increases in amplitude (slurred note) for more precise audio segmentation. Currently on schedule – working on interim demo presentation and integrating shivi, deeya, and i’s part.

March 22, 2025March 22, 2025

Grace Status Report for 3/22/25

This week was dedicated to testing the audio segmentation code that I created last week. From here, I realized that I would need to change the peak difference threshold, or how steep a change must be to be considered a new note, as well as the max time difference, how long it must take for a note to be considered a new note, change with the BPM and audio quality. After testing with a random audio sample from youtube of Ten Little Monkeys, we realized that this audio was probably not that useable, at least for early stage testing, as the eighth notes were note distinct enough for the code, and quite honestly myself manually, to identify.

In this image, using the audio recording of ten little monkeys, I have manually identified where the notes changes using the green line. You can see here that the code (in the red lines) are not correctly identifying these notes. However, the change is not that significant either (less than a 0.3 RMS difference) and doesn’t approach zero as much as other audios do, like twinkle twinkle little star. To try and fix this, I experimented with different audio amplifications – first just scaling the signal, but this resulted in the “near zero” value also getting scaled, meaning it still wouldn’t pick up the note, though it had a significant difference now. Then I tried to scale it exponentially, to keep the near zero values near zero and the other values increase, but since it is fairly quiet overall, this meant that everything was near zero and didn’t have a significant difference. This led us to first experiment with clearer cut audios before coming back to this one when the program is more advanced.

My next steps would be finding more distinct/staccato recordings of faster notes and seeing how the audio segmentation handles that and polish off my rhythm detection code.

March 14, 2025March 14, 2025

Grace Status Report for 3/15/25

This week, I got audio segmentation up and working. After our previous conversation with Professor Sullivan, I first converted the audio signal into RMS values.

My first approach was to calculate if there was a sharp increase in the RMS. However, this caused me to incorrectly identify some spikes multiple times. Increasing the amount of time that has passed since the last identified point often caused me to miss some beginning of notes.

(image where the dots would signify the code identifying the start of a note, as you can see, it was too much)

I then realized that before notes, oftentimes the RMS would get near zero. So my next approach was to convert my code to identify when the RMS is near zero, but then when I got in a moment of silence (like a rest) I would incorrectly want to segment the silence into many different segments, which would waste a lot of time. So I tried to do a combination of the two then where I would look for when the RMS was near zero and then look for the nearest peak. Then if this nearest peak RMS minus the starting RMS (near zero) difference was greater than a specific threshold, currently 0.08 but this is still getting experimented with, then I would identify it as a correct note. While this was the most accurate approach thus far, I still ran into a bug where even in the moments of silence it would find the nearest peak, a couple of seconds away, and identify the silence as multiple beginning of notes again. I fixed this by checking how far away the peak was and making a maximum threshold.

(where the dotted blue line would be where the code identified the nearest peak and the red line is where the code is marking the near zero RMS values)

Currently this works for a sample of twinkle Twinkle Little Star. When testing this with a recording of ten little monkeys, it works if we lowered the RMS threshold, which signified that we would need to standardize our signal somehow in the future. We also noticed that with quicker notes, the RMS values don’t get as close to zero as quarter or half notes, so we might need to increase the threshold for what is considered near zero.

(red line is where code has identified beginning of notes and blue dotted line is where i manually identified the beginning of notes)

March 8, 2025

Grace’s Status Report for 3/8/25

Last week, I primarily worked on the design review document, refining the finer details. Additionally, the conversation we had during the design presentation was useful as we decided that the noise filtering features might be excessive, especially with the microphone being so close to where the signal will be coming from. As our microphone has just recently come in, we are excited to experiment with this process and test these in environments that flutists commonly compose in, like their personal rooms (where there might be slight background noise) and studios (virtually no noise). Hopefully with the calibration step, we can eliminate excessive noise filtering and decrease the amount of time it takes to get the sheet music to the user. Furthermore, after meeting with Professor Sullivan last week, we have a better idea of implementing audio segmentation, deciding to focus on the RMS rather than just the peaks in amplitude for note beginnings, so we plan on implementing a sliding RMS window of around 10 ms and looking for peaks there. After creating these segmentations, I plan to implement my module of rhythm detection here since we cannot just use the length of the segmentation as there might be a rest in the segmentation. Overall we are currently on track, but this week we expect to run into more issues as audio segmentation will most likely be the hardest aspect of our project.

Finally, we are excited for how our collaboration with the music department will look as many seem interested. We will be reaching out to the flutists this week to get them registered for the mini as well.

February 22, 2025

Grace’s Status Report For 2/22/25

This week, I mostly focused on working on our design review presentation and our design proposal. Since this week I was the one presenting, I focused on making sure our slides were ready and rehearsed, and ensured that I could present all the information from my team mate’s slides as well as limiting words to make our presentation less word heavy. In class, we listened to the different presentations and gave feedback. This was helpful since professor Sullivan mentioned that some of our filtering for noise suppression might be excessive, which would slow down our processing time and lengthen our latency, so we will continue with experimenting using just the bandpass and see if the other filtering methods are necessary. I further experimented with using different thresholds and filters to better isolate the rhythm of single notes.

For this week, I will be working on my sections of the design review paper as well as doing further research on audio segmentation because depending on how this is working will determine how I will implement rhythm detection. I will be meeting with Shivi to work on this aspect. Our project is currently on schedule but other tasks might have to be pushed back with the introduction of audio segmentation. I hope to get this (audio segmentation + rhythm detection) working by the week after spring break at the latest.

February 15, 2025February 15, 2025

Grace’s Status Report for 2/15/25

This week, I worked on creating the rhythm detection algorithm. We first practiced by simply writing into a midi file, using the mido library in python, and then uploading the output into musescore so we could see what the sheet music generation looked like.

We are trying to get the bare bones aspect of the project working, so we did a few different recordings, including the metronome alone, someone playing a D on the flute with the metronome in the background and no other sound, and then someone playing a D on the flute with some background noise (people talking). This helps us test with just detecting a note with the clear recording, but also experiment with noise suppression, which Shivi is working on.

(what the isolated signals for the metronome and the flute note look like)

After analyzing what these frequencies look like after doing a fourier transform on them, I isolated the signal by using a filter to filter out all other frequencies than the pitch of the note played and calculated the duration of the notes, using the inputted bpm. However, with audio recordings, there tends to be a lot of variation in the sound quality, creating a lot of peaks within the wave. This originally made my code think there were multiple different notes being played since I was trying to calculate it by the peaks. After analyzing the signal further, I migrated to using 20% of the max amplitude to use as a threshold to calculate the duration of a note. I then transcribed this into a midi file and uploaded it to musescore to look at the results. Though it is still not accurate for the rhythm, I am hopeful that this will be working soon and plan on using a sliding window filter in future testing to reduce the number of peaks and noise.

(what is currently being transcribed to musescore, should just be a signal note so will need to reassess my threshold value)

My current progress is on schedule. This next week I hope to get the rhythm detection accurately working for a whole note, quarter note, and a half note for the same note at the very least. Hopefully, I will be able to detect switches in notes soon as well.

February 7, 2025February 8, 2025

Grace Status Report for 02/08/25

This week, we primarily focused on finishing our proposal slides. We made sure to get them done in advance so our TA, Ankit, would be able to give us feedback on our schedule and our parts. In class, we listened to the different presentations and gave feedback. It was helpful to see all the presentations in our section, since most of them also related to signal processing, and it gave us inspiration on how we might approach handling our signals from the flute.

During the presentation, Ankit gave us some insightful feedback on whether we should continue using hardware for the signal processing, which would entail using a microcontroller and possibly an op amp, or move to entirely software. It is true that most likely using solely software would help with the real time aspect of the project, but I’m not entirely sure how this would end up translating for the rhythm detection, which I am tasked with. Originally, our plan was to include a blinking light on a physical device/hardware and that would allow people to maintain a steady bpm (essentially providing a metronome). With moving to software, we could implement a similar feature on the web application but I’m not sure how accurately we would be able to detect rhythm since our original plan was to time it with how long it would take to send to the web app via serial. I would definitely need to do some more research on how to record and detect rhythm.

Our plan from here on out is to divide into our scheduled task as we are currently on schedule, with Shivi and I primarily working on the signal processing side of the project and Deeya doing research and beginning to work on the web app. Our first next step would be reaching out to Ankit, or any other specialist TA in signal processing, to ask which would work best for us in terms of hardware vs software.

Looking forward, I hope to have a solid game plan and have implemented the bare bones of the rhythm detection for the signal processing and work on integrating this information with Shivi to put into our midi file within the next two weeks. Furthermore, we hope to meet with some flutists from CFA and get their feedback on how they would like the website to look and how they would like the machine to function.