Team Status Report for 4/12/25

Last week, we had a successful interim demo where we had our transcription pipeline working for a recording of Twinkle Twinkle Little Star. We also met with a flutist from the School of Music to get her feedback on our pipeline and obtain some sample recordings. She found the interface intuitive and easy-to-use, though we did run into some bugs with audio file formats and found that our note segmentation struggled a little with slurred notes. 

This week, we focused on the following items:

  1. Fix the denoising step
  2. Set up websockets for real-time BPM adjustment
  3. Any remaining frontend-backend integration remaining for the web app. (for e.g., earlier we had some bugs with audio file formats and with recording audio directly via the web app)
  4. Using a Short Time Energy approach instead of RMS to perform note segmentation. This helped to better account for rests/slurs in the music.

Later this weekend, we are meeting again with another flutist from the SoM to obtain more audio and to see if our note segmentation performs better this time. Our new audio interface and XLR cable also arrived this week, so we will hopefully be able to collect better audio samples as well. In the upcoming week, we will focus on:

  1. Polishing our STE/note segmentation
  2. Fixing the issues with making our sheet music editable via the Flat API
  3. Collecting user metrics such as their transcription history
  4. Deploying our web app
  5. Preparing our final presentation/demo
  6. Thorough testing of our pipeline

Below is our plan for verification, which we already started last week.

Team Status Report for 3/29/25

This week we are working on integrating the basic functions of the code together. First Shivi and Grace will be combing the rhythm and pitch code together into a main.py, which will then be pulled as the backend for Deeya’s web app code. Then we’ll be integrating it with an API to generate the sheet code UI.

While integrating, we figured out that we need to modify audio segmentation a bit to account for the periods of rest time better. We will be looking into using concave changes in the audio to actually compute the start of a new note. Additionally, we will need to experiment with encoding into the midi file as currently the rhythm detection is outputting the type of note (quarter, half, etc) but the midi encoding will take in the start and end times of samples. We will be further testing if we can use actual start and end times and the midi api will round itself to be the current notes or if that is something we will be doing manually in the future.

For this week, we will be working on having a working basic demo for interim demos and then working on having the ability to change the sheet music within the API (manually adding and deleting notes, etc)

Team Status Report for 03/22/2025

This week we came up with important tasks for each of us to complete to make sure a working prototype for the interim demo. For the demo we are hoping that the user would be able to upload a recording, the key, and the BPM. The web app should then trigger the transcription pipeline and then the user can view the score. We are aiming to have this all finished for a rhythmically very simple music piece. 

These are the assigned tasks we each worked on this week:

Grace: Amplify Audio for Note segmentation, Review note segmentation after amplification, Rhythm Detection

Deeya: Get the Metronome working, Trigger backend from the webapp

Shivi: Learn how to write to MIDI file, Look into using MuseScore API + allowing user to edit generated music score and key signature

After the demo we are hoping to have the user be able to edit the score, refine the code to better transcribe more rhythmically complex audio, do lots of testing on a variety of audio, and potentially add tempo detection. 

Also, all of our parts (microphone, xlr cable, and audio interface) have arrived and this week we will try to get some recordings with our mic from the flutists from SOM. 

This week we completed the ethics assignment, which made us think a little bit more about plagiarism and user responsibilities when using our web app. We came to the conclusion that we might need to include a disclaimer for the user that they need to be careful and pay attention to where they are recording their piece so that someone else can’t do the same. Also, we have decided not to work on our stretch goal into a project anymore after reflecting on our conversation with Professor Chang. We instead will be focusing on making sure the web app interface is as easy as possible to use and that the user can customize and edit the sheet music our web app generates. Overall we are on track with our schedule.

Team Status Report for 3/15/25

This week, we made significant progress on the web app setup, audio segmentation, and pitch detection components of our project. We also received our microphone, and Professor Sullivan lended us an audio interface that we can use to record some audio.

Below is an image of what our web app currently looks like. Here, a user can upload flute audio and a recording of their background. They can also adjust the tempo of the metronome (at least for MVP, we are not performing tempo detection, and the user needs to set their tempo/metronome).

Additionally, we now have a basic implementation of audio segmentation (using RMS) working. Below is a graph showing a flute signal of Twinkle Twinkle Little Star, where the red lines mark the start of a new note as detected by our algorithm, and the blue dotted lines represent the actual note onset. Our algorithm’s detected notes were within 0.1ms of the actual note onset.

We achieved similar results with Ten Little Monkeys at regular and 2x speed, though we still need to add a way to dynamically adjust the RMS threshold based on the signal’s max amplitude, rather than using trial and error.

We also started performing pitch detection. To do so, we are using comb filtering and Fourier transforms to analyze the frequencies present in the played note. We then use the fundamental frequency to determine the corresponding note. We were able to successfully determine the MIDI notes for Twinkle Twinkle and plan to continue testing this out on more audio samples. 

We are on schedule with our progress currently. For the upcoming week, we plan to integrate all of our existing code together and test/refine the audio segmentation and pitch detection to ensure that it is more robust to various tempos, rhythms, and frequencies. We are also soliciting the SOM flutists’ availability so that we can start some initial testing the week of March 24th. Additionally, after speaking with Professor Chang last week during lab, we have decided to build in some time to add a feature in which users can edit the generated music score (i.e., move measures around, adjust notes, add notation such as trills/crescendos/etc. and more). 

Team Status Report for 3/8/25

This past week, we mostly focused on finishing up our design review by ensuring that the language was specific enough and concise. Additionally, we focused on adding in clear graphics, like pseudo graphics and diagrams to help convey the information. In addition to this, we have also met with Professor Almaraza and confirmed the use case requirements and gained their opinion on the current workflow. From this, we also got three flutists to sign up for testing for our project and will now be working on getting them additionally sign up for the conjoined mini course. In terms of the project, we have a more clear understanding of how to implement audio segmentation after some research and discussing the concept with professor Sullivan and look towards really finishing this portion up this next week. 

Overall, we are currently on track, though may run into the some issues with the audio segmentation as this will be the most difficult aspect of our project.

Part A (Grace): Our product solution will meet the global factors for those who are not as technologically savvy by making the project as easy to understand as possible. Already this project is significantly decreasing the strain of composing your own music by eliminating the need for individuals to know exact lengths of notes, pitches, etc of when they are composing music and simplifying the process by decreasing the amount of time it takes to transcribe. In addition to this, the individual will only have to upload an mp4 recording of them playing before getting transcribed music, as we will be handling all the backend aspects of this. As such, even the technologically unsavvy should be able to use this application. Furthermore, we aim to make the UI user friendly and easy to read. 

In addition, we aim to make this usable in other environments, not just an academic one, by filtering out the outside noise to allow users to be able to use the application in even noisy settings. As mentioned in our design reviews, we will be testing this application in multiple different settings to hopefully encompass the different environments this website would be used globally. 

Part B (Shivi): Our project can make a cultural impact by allowing people to pass down music that lacks standardized notation. For instance, the traditional/folk tunes (such as the Indian bansuri or Native American flute) are often played by ear and likely to be lost over time, but our project can help transcribe such performances, allowing them to be preserved over multiple generations. This would also help to increase access to music for people from different cultures, promoting cross-cultural collaboration. 

Part C (Deeya): Write on Cue is addressing environmental factors by encouraging a more sustainable system of digital transcription over printed notation – reducing paper usage. Also digital transcription allows musicians to learn, compose, and practice remotely, reducing the need for physical travel to lessons, rehearsals, or recording sessions. By reducing transportation energy and paper consumption, it helps make our product more environmentally-friendly. Also, instead of relying on large, energy-intensive AI models, we are going to use smaller, more efficient models trained specifically for flute music, which will help reduce computation time and power consumption. We will look into techniques like quantization to help speed up inference.

Team Status Report for 2/22/25

This past week we presented our Design Review slides in class and individually spent time working on our respective parts of this project. Deeya is almost done completing the basic functionality of the website and has so far completed the UI of the website, user profiles and authentication, and navigation of different pages. Grace and Shivi are planning to meet up to combine their preprocessing code and figure out the best way to handle audio segmentation. They want to make sure their approach is efficient and works well with their overall pipeline that they are each creating on their own.

This week we will focus on completing our design report, with each of us working on assigned sections independently before integrating everything into a cohesive report. We are planning to finish the report a few days before the deadline so that we can send it to Ankit and get his feedback. This Wednesday we will also be meeting with Professor Almarza from the School of Music and his flutists to explain our project and to come up with a plan on how we would like to integrate their expertise and knowledge into our project.  

Overall we each feel that we are on track with our respective parts of the project and we are excited to meet with the flutists this week. We haven’t changed much to our overall design plan and there aren’t any other new risks we are considering besides the ones we laid out in our summary report last week. 

Team Status Report for 2/15/25

This week, we focused on working on our high-level design for our design presentation. After discussing with Ankit and Professor Sullivan, we placed an order for some hardware to begin working with: Behringer CB 100 Gooseneck Condenser Instrument Microphone, and an XLR to USB-C Adapter. This will allow us to improve our current experiments, as we will be able to obtain clearer audio recordings. Based on our discussions this past week, we also decided to move our entire implementation into software. Additionally, we determined that it would be best for us to provide users with a metronome (which will be a sound outside the frequency range of the flute so that it can be filtered out later) set to a default 60 BPM, which the user will be able to adjust in real-time using a slider on our web app. Previously, we had recorded single notes from the B flat major scale to experiment with harmonics, but this week we met up to also record some noisy signals to experiment with noise reduction and work on encoding information for single notes into a MIDI file and uploading it to Musescore to see if we could translate it into sheet music (see individual team member reports). After a lot of discussion, we also concluded that a real-time transcription is not relevant for our use case, since a user would only need to see the transcribed output once they are done playing. 

Our new pipeline will work as follows:

  1. User logs into their account.
  2. User calibration: Ensure the user is playing at some minimum threshold before they upload a recording.
  3. User is prompted to record and upload an audio file of their background noise.
  4. User is prompted to record flute audio. (Consideration: have a play/pause recording in case the user needs to pause in between the recording?) To do so, they turn on the metronome on the web app. Metronome is set to 60 BPM by default, but they can adjust it in real-time using a slider.
  5. The audio is saved in the website’s database, and the pitch/rhythm detection pipeline is triggered in the backend. 
    1. Noise suppression via Butterworth filter and adaptive noise filtering. 
    2. Audio segmentation: Spectral Flux (how much the spectrum changes over time) and Short-Time Energy (STE) (detect sudden amplitude increases) to determine onset of a note
    3. For each segment (we can parallelize this with threading so multiple segments can be processed at once):
      1. Use note length to determine its type (eighth, quarter, half, whole, etc)
      2. Use FFT to determine frequency/pitch and classify which note it is
  6. Encode the info from all the segments into a MIDI file
  7. MIDI file gets uploaded to the web database and MuseScore API converts MIDI into sheet music
  8. Newly generated file is stored along with that user’s previous transcriptions that they can view
  9. IF time remains: we can add an editing feature where the user can adjust transcribed notes and add additional notation like crescendos, etc.

The biggest risk/challenge as of now is verifying if the methods that we are planning to use for noise suppression, pitch/rhythm detection will work or not. For instance, based on this week’s experiments with noise suppression, we experimented with a variety of filters (see Shivi’s status report), but found that many times, the flute audio would also get suppressed or that the background noise would not be suppressed enough. We would like to run more experiments, and our contingency for this is a calibration step that gives us a noise sample that we can then subtract from the flute audio signal. Similarly, note onset detection will probably be quite challenging as well, because it may be difficult to determine the exact moment a note ends. This is why we are deciding to segment the audio as our initial processing step, and then “round” the duration of each segment to the nearest eighth of a beat based on the BPM. 

Despite these challenges, we are on-track with our schedule; over the next week, we plan to have an even more detailed design while simultaneously working on setting up our web app, experimenting more with signal calibration/noise suppression, and starting on audio segmentation.

Week-specific status report questions:

Part A (Shivi): Our flute transcription system enhances public health by supporting creative expression through music learning. With an attachable microphone and a software-only pipeline, it is affordable and safe to use. Our system also promotes welfare by lowering barriers to music production, as it can be made accessible online for musicians, students, and educators to use.

Part B (Grace): Write on Cue aims to make music transcription more accessible to diverse musical communities, including amateur musicians, educators, composers, and students from various cultural and social backgrounds. This benefits people who may not have the technical skills or resources to manually transcribe music and allows individuals to better engage with music across a variety of cultural contexts. For example, in communities where formal music education is less accessible, our project can provide a more equitable way for musicians to preserve and share traditional flute music, irrespective of whether they are classically trained. Additionally, socially, this allows musicians from different backgrounds to contribute their musical expressions and allows for easier preservation of musical heritage. 

Part C (Deeya): Traditional methods of music transcription are time-consuming and require specialized knowledge, creating a barrier for learners who want to review their performances or for educators who need to provide detailed feedback. By streamlining the transcription process, our project reduces the dependency on costly manual transcription services, which lowers the overall cost of producing sheet music. Also, we are designing our project on a web app, which maximizes accessibility and encourages a cheaper and more widespread music education.

Team Status Report for 02/08/25

As we had proposal presentations this week, we worked hard on finishing up our slides, ensuring that they were done far enough in advance that Ankit, our TA, would be able to give us feedback on our schedule. Here, Ankit had mentioned the possibility of converting our hardware systems (like the microcontroller of an arduino) be done solely in software instead, as it would function a lot faster. We are currently considering this option: Since we would ideally like to convert this system into real time, it would be best for faster processing. However, this could result in changes on how we approach tasks, like rhythm detection. We are planning on reaching out to Ankit again to talk this over further. 

Last week, we also meet with Professor Dueck and other musicians to discuss what our project looks like and how the music department could contribute to our project, such as allowing us to work in her studio to test the flutes in a relatively noiseless environment, which would be best for a bare bones working project. Additionally, she connected us with Professor Almarza, who will be helping us find some flutists to help test our project.

After this, we experimented with looking at some straight tone flute signals and seeing how this pitch would appear in Matlab. This is to get more insight in getting a bare bones project up and working.

Currently, our most significant risk would be switching the project and having unforeseen consequences and then having to backtrack to the hardware idea, which is a little more fleshed out due to past project references. These risks could be managed by discussing this further with our TA and staff, like Professor Sullivan. As such, this might pose a possible change to the existing design, specifically the system spec, to help with the speed. Overall, we feel that we are on track and excited to see where our project tasks us as well as work collaboratively with CMU musicians to get their feedback throughout the process.