Team Status Report for 2/15/25

This week, we focused on working on our high-level design for our design presentation. After discussing with Ankit and Professor Sullivan, we placed an order for some hardware to begin working with: Behringer CB 100 Gooseneck Condenser Instrument Microphone, and an XLR to USB-C Adapter. This will allow us to improve our current experiments, as we will be able to obtain clearer audio recordings. Based on our discussions this past week, we also decided to move our entire implementation into software. Additionally, we determined that it would be best for us to provide users with a metronome (which will be a sound outside the frequency range of the flute so that it can be filtered out later) set to a default 60 BPM, which the user will be able to adjust in real-time using a slider on our web app. Previously, we had recorded single notes from the B flat major scale to experiment with harmonics, but this week we met up to also record some noisy signals to experiment with noise reduction and work on encoding information for single notes into a MIDI file and uploading it to Musescore to see if we could translate it into sheet music (see individual team member reports). After a lot of discussion, we also concluded that a real-time transcription is not relevant for our use case, since a user would only need to see the transcribed output once they are done playing. 

Our new pipeline will work as follows:

  1. User logs into their account.
  2. User calibration: Ensure the user is playing at some minimum threshold before they upload a recording.
  3. User is prompted to record and upload an audio file of their background noise.
  4. User is prompted to record flute audio. (Consideration: have a play/pause recording in case the user needs to pause in between the recording?) To do so, they turn on the metronome on the web app. Metronome is set to 60 BPM by default, but they can adjust it in real-time using a slider.
  5. The audio is saved in the website’s database, and the pitch/rhythm detection pipeline is triggered in the backend. 
    1. Noise suppression via Butterworth filter and adaptive noise filtering. 
    2. Audio segmentation: Spectral Flux (how much the spectrum changes over time) and Short-Time Energy (STE) (detect sudden amplitude increases) to determine onset of a note
    3. For each segment (we can parallelize this with threading so multiple segments can be processed at once):
      1. Use note length to determine its type (eighth, quarter, half, whole, etc)
      2. Use FFT to determine frequency/pitch and classify which note it is
  6. Encode the info from all the segments into a MIDI file
  7. MIDI file gets uploaded to the web database and MuseScore API converts MIDI into sheet music
  8. Newly generated file is stored along with that user’s previous transcriptions that they can view
  9. IF time remains: we can add an editing feature where the user can adjust transcribed notes and add additional notation like crescendos, etc.

The biggest risk/challenge as of now is verifying if the methods that we are planning to use for noise suppression, pitch/rhythm detection will work or not. For instance, based on this week’s experiments with noise suppression, we experimented with a variety of filters (see Shivi’s status report), but found that many times, the flute audio would also get suppressed or that the background noise would not be suppressed enough. We would like to run more experiments, and our contingency for this is a calibration step that gives us a noise sample that we can then subtract from the flute audio signal. Similarly, note onset detection will probably be quite challenging as well, because it may be difficult to determine the exact moment a note ends. This is why we are deciding to segment the audio as our initial processing step, and then “round” the duration of each segment to the nearest eighth of a beat based on the BPM. 

Despite these challenges, we are on-track with our schedule; over the next week, we plan to have an even more detailed design while simultaneously working on setting up our web app, experimenting more with signal calibration/noise suppression, and starting on audio segmentation.

Week-specific status report questions:

Part A (Shivi): Our flute transcription system enhances public health by supporting creative expression through music learning. With an attachable microphone and a software-only pipeline, it is affordable and safe to use. Our system also promotes welfare by lowering barriers to music production, as it can be made accessible online for musicians, students, and educators to use.

Part B (Grace): Write on Cue aims to make music transcription more accessible to diverse musical communities, including amateur musicians, educators, composers, and students from various cultural and social backgrounds. This benefits people who may not have the technical skills or resources to manually transcribe music and allows individuals to better engage with music across a variety of cultural contexts. For example, in communities where formal music education is less accessible, our project can provide a more equitable way for musicians to preserve and share traditional flute music, irrespective of whether they are classically trained. Additionally, socially, this allows musicians from different backgrounds to contribute their musical expressions and allows for easier preservation of musical heritage. 

Part C (Deeya): Traditional methods of music transcription are time-consuming and require specialized knowledge, creating a barrier for learners who want to review their performances or for educators who need to provide detailed feedback. By streamlining the transcription process, our project reduces the dependency on costly manual transcription services, which lowers the overall cost of producing sheet music. Also, we are designing our project on a web app, which maximizes accessibility and encourages a cheaper and more widespread music education.

Leave a Reply

Your email address will not be published. Required fields are marked *