Michelle’s Status Reports – Team D6: Rhythm Genesis

April 26, 2025

Michelle’s Status Report for 4/26

This week, I worked on trying to analyze the rhythm of just the vocals of pop songs. I used a technique for vocal separation that involves using a similarity matrix to identify repeating elements and then calculates a repeating spectrogram model using the median and extracts the repeating patterns using a time-frequency masking. The separation is shown in an example below:

Vocal separation of Edith Piaf’s La Vie en Rose

Using this method, there is still some significant bleeding of the background into the foreground. Additionally, the audio processing with vocal separation takes on average 2.5 minutes for a 5 minute song, compared to 4 seconds without the vocal separation. Even when running the rhythm detection on a single voice a cappella track, the rhythm detection does not perform as well as on for a piano or guitar song, for example, since in general the notes have less distinct onsets in singing. Thus I think users who want to play the game with songs with voice in it are better off using the original rhythm detection and refining with the beat map editor or using the editor to build a beat map from scratch.

I am on schedule and on track with our goals for the final demo. Next week, I plan to conduct some user testing of the whole system to validate our use case requirements and ensure there are no bugs in preparation for the demo.

April 12, 2025April 12, 2025

Michelle’s Status Report for 4/12

This week, I continued testing and refining the rhythm analysis algorithm. I tested out a second version of the algorithm that more heavily weights the standard deviation of the onset strengths into determining whether to count a peak as a note or not. This version is much more accurate across various tempos, as shown in the figure below. These are the results of testing a self-made single clef piano composition. The first version would have more false positives at a very slow tempo and false negatives at a very fast tempo, the missed notes typically being any 32nd notes or some 16th notes. The second version when tested on the same piece performs much better, only missing a few 32nd notes at very fast tempos.

The verification methodology involves creating compositions in MuseScore and generating audio files to test the processing algorithm on. This way, I have an objective truth of the tempo and rhythm and can easily manipulate variables such as the instrument, dynamics, time signature, etc. and see how these affect the accuracy. Additionally, I also test the algorithm using real songs, which often have more noise and more blended sounding notes. Using a Python program, I can run the analysis on a song I uploaded, and playback the song while showing an animation that blinks on the extracted timestamps and record any missed or added notes. To verify that my subsystem meets the design requirements, the algorithm must capture at least 95% of the notes without adding any extra notes of single-instrument songs between 50 and 160 BPM.

Comparing results of V1 and V2 on a piano composition created in MuseScore

I also tested an algorithm that uses a local adaptive threshold instead of a global threshold. This version uses a sliding window so it compares onset strengths more locally, which can allow the algorithm to be more adaptive over the course of a piece especially when there are changes in dynamics. The tradeoff with this is that it can be more susceptible to noise.

I am on track with the project schedule. I think the current version is sufficient for the MVP of this subsystem, so further work will just be more extensive testing and stretch goals for more complex music. I have begun creating more compositions with even more complex rhythms, including time signature changes, which I plan to test this V2 on next week. I also will test the algorithm on pieces with drastic dynamic changes. I plan to play around with the minimum note length more as well. Since V2 is experiencing less false positives, I may be able to decrease this from the current 100ms to accommodate more complex pieces. Additionally, I want to test out a version that uses median absolute deviation instead of standard deviation to see if this outperforms V2. This method will be less sensitive to extreme peaks.

March 29, 2025

Michelle’s Status Report for 3/29

This week, I continued finetuning the audio processing algorithm. I continued testing with piano and guitar and also started testing voice and bowed instruments. These are harder to extract the rhythm from since the articulation can be a lot more legato. If we used pitch information, it may be possible to distinguish note onsets in slurs, for example, but this is most likely out of scope for our project.

Also, there was a flaw in calculating the minimum note length based on the estimated tempo because sometimes a song that most people would consider 60 BPM, librosa would estimate 120 BPM, which is technically equivalent, but then the calculated minimum note length would be much smaller and result in a lot of “double notes”, or note detections directly after one another that resulted from one more sustained note. For the game experience, I believe it is better to have more false negatives than false positives. I think having a fixed minimum note length will be a better generalization. A threshold of 0.1 seconds seems to work well.

Additionally, In preparation to integrate the music processing with the game, I added some more information to the JSON output that bridges the two parts. Based on the number of notes in for a given timestamp, the lane numbers are randomly chosen from which the tiles will fall from.

My progress is on schedule. Next week, I plan to finalize my work on processing the rhythm of single-instrument tracks and meet with my teammates to integrate all of our subsystems together.

March 22, 2025

Michelle’s Status Report for 3/22

This week I continued testing my algorithm on monophonic instrumental and vocal songs with fixed or varying tempo. I ran into some upper limits with SFML in terms of how many sounds it can keep track of at a time. For longer audios, when running the test, both the background music and the generated clicks on note onsets will play perfectly for about thirty seconds before the sound starts to glitch and then goes silent and produces this error:

It seems that there is an upper bound of SFML sounds that can be active at a time and after running valgrind it looks like there are some memory leak issues too. I am still debugging this issue, trying to clear click sounds as soon as they are done playing and implementing suggestions from forums. However, this is only a problem with testing as I am trying to play probably hundreds of metronome clicks in succession, and will not be a problem with the actual game since we will only be playing the song and maybe a few sound effects. If the issue persists, it might be worthwhile to switch to a visual test. This will be closer to the gameplay experience anyway.

Next week I plan to try to get the test working again, try out a visual test method, and work with my team members on integration of all parts. Additionally, after having a discussion with my team members, we think it may be best to leave more advanced analysis of multi-instrumental songs as a stretch goal and focus on the accuracy of monophonic songs for now.

March 16, 2025

Michelle’s Status Report for 3/15

I started out this week with exploring how to leverage Librosa to analyze the beat of songs that have time-varying tempo. These are the results of processing, using a standard deviation of 4 BPM, an iPhone recording of high school students at chamber music camp performing Dvorak Piano Quintet No. 2, Movement III:

When running the a test that simultaneously plays the piece and a click on each estimated beat, the beats sound mostly accurate but not perfect. I then moved on to adding note onset detection in order to determine the rhythm of the piece. My current algorithm selects timestamps where the onset strength is above the 85th percentile. It then removes any timestamps that are within a 32nd note of each other, which is calculated based on the overall tempo. This works very well for monophonic songs that can have some variation in tempo. For multi-instrumental tracks, it tends to detect the rhythm of the drums if present, since these have the most clear onsets, and some of the rhythm of the other instruments or voices.

I also worked on setting up my development environment for the new game engine Yuhe built. Next week I plan to continue integrating the audio analysis with the game. I also plan to adjust the rhythm algorithm to dynamically calculate the 32nd note threshold based on the dynamic tempo, as well as experiment with different values for the standard deviation when calculating the time-varying tempo. I also would like to look into possible ways that we can improve rhythm detection in multi-instrumental songs.

March 9, 2025March 9, 2025

Michelle’s Status Report for 3/8

This week, I worked on creating an algorithm for determining the number of notes to be generated based on the onset strength of the beat. Onset strength at time t is determined by max(0, S[f, t] – ref[f, t – lag]) where ref is S after local max filtering along the frequency axis and S is the log-power Mel spectrogram.

Since a higher onset strength implies a more intense beat, it can be better represented in the game by chords. Likewise, a weaker onset strength would generate a rest or a single notes. Generally we want more single notes than anything else, with three note chords being rarer than two note chords. These percentiles can be easily adjusted later on during user testing to figure out the best balance.

My progress is on schedule. Next week, I plan to refactor my explorations with Librosa into modular functions to be easily integrated with the game. I will also be transitioning from working on audio analysis to working on the UI of the game.

February 22, 2025February 22, 2025

Michelle’s Status Report for 2/22

This week, I continued working on validation of fixed-tempo audio analysis. The verification method I created last week of playing metronome clicks on beat timestamps while playing the song was not ideal because of the multi-threading timing issues and then human error introduced when taking out the threading and playing the song manually attempting to start at the same time.

This week, I created an alternate version that that uses matplotlib to animate a blinking circle on the timestamps while playing the song using multi-threading. The visual alignment will be more accurate to the gameplay as well. I used 30 FPS since that is the planned frame rate of the game. Here is a short video of a test as an example: https://youtu.be/54ToPpPSpGs

When testing tempo error on the self-composed audio library where we know the ground truth of tempo and beat timestamps, faster songs of 120 BPM or greater had a tempo error of about 21ms which is just outside our tolerance of 20ms. When I tested fast songs with the visual animation verification method, the error was not very perceivable to me. Thus, I think fixing this marginal error is not a high priority and it might be justified to relax the beat alignment error tolerance slightly, at least for the MVP. Further user testing later on after integration will be needed to confirm this.

My progress is on track for our schedule. Next week I plan to wrap up fixed-tempo beat analysis and move onto basic intensity analysis which will be used to determine how many notes should be generated per beat. This is a higher priority than varying-tempo beat analysis. Testing with a wide variety of songs will be needed to finetune our algorithm for calculating the number of notes for each level for the most satisfying gaming experience.

February 15, 2025

Michelle’s Status Report for 2/15

This week I worked on building a verification process for determining beat alignment error. I created some songs as tests so that I know the ground truth of the tempo and beat timestamps for those songs with some math. Then, I compared this with what the beat tracking observed. I focused on tempo invariant songs. For most tempos, the beat alignment error fell within the acceptance criteria of 20ms. For tempos of about 120 BPM or greater, the beat alignment error was 21+ milliseconds. Further work is needed to finetune that.

I also created another method of testing to use for real songs where the exact tempo and beat alignments are unknown. I built a Python program extracts beat timestamps from an audio file then plays back the audio while playing a click sound at the observed beat timestamps. I initially used threading to play the song and the metronome clicks at the same time, but this introduced some lagging glitches in the song. I removed the threading and just played the generated metronome while separately playing the song myself, attempting to minimize human error in starting the song at the exact correct time. With this method, the timestamps sounded highly accurate and stayed on beat throughout.

The audio analysis portion of the project is on schedule. Next week, I want to see if I can find some way to reduce the beat alignment error for songs that are above 120 BPM.

February 8, 2025

Michelle’s Status Report for 2/8

This week I focused on getting familiar with the Librosa library and exploring its functions that will be relevant to our project. I was able to demonstrate tempo analysis and beat extraction working for songs with invariable tempo. I also started looking into ways to determine intensity, as that would be useful information for the game whether it affects number of notes or fun animations. Onset Strength may be a useful metric for intensity. Here are some notes from my exploration so far: https://docs.google.com/document/d/1x1v-kjtmkiGLtpr_eNxQ_WqPQmr7fqHTX-byE4wmWC8/edit?usp=sharing

I also completed a basic tutorial for Unity game engine, as I have never used it before. I created a working Flappy Bird game to get familiar with Unity and C#. This will be useful for when I start working on the game UI later on in the project timeline.

My progress is on schedule. Next week, I plan to further explore Onset Strength as a metric and figure out how to extract that from songs. I also want to look into how to analyze the tempo and beats of songs with variable tempo, for example most classical music.