Aakash’s Status Report for 10/5/2024

This week comprised of completing the design presentation and doing some initial prototyping on the timing comparison algorithm. I have built out a basic timing algorithm prototype in python using the expected data coming from the sheet music and audio data. From the sheet music I am getting data in the form of ‘d1/4′ for a note where the d stands for the music note d and the 1/4 stands for a quarter note. From the audio data I am going to be receiving a tuple in the form of (onset time, pitch) which is going to look like this:  (0, ‘d’) where 0 is the onset time and d is the music note d.

In terms of implementation I have been thinking about what kind of data transformations I want to implement in order to properly analyze this data. I have considered transforming the sheet music into an array based on the what beat we are on, but the issue with this is it doesn’t distinguish between a new note and a sustained note. Because we are focusing on note onset for this timing data, I was considering using this array, and only focus on note onset for this beginning prototype. For example, if I receive  the data [‘d1/4′, ‘d1/4’, ‘d1/2’, ‘d1/2’], because 1 beat is a quarter note, I would create a per beat data structure that would be [d, d, d, s, d, s] where s stands for sustained. I need to consider other edge cases such as notes that are less than a beat but that can be solved by just using an increment that is some fraction of a beat and scaling accordingly. 

I have also started a github repo to contain our project code and pushed some prototype code to this.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Progress is on schedule so far.

What deliverables do you hope to complete in the next week?

For the next week I want to focus on getting real data from the music we have recorded and determine what the current timing latency is with python and if I should use a c++ implementation. I also want to finish a basic prototype before spring break based on the data we recorded. I have three exams next week so I might not be able to accomplish all of these but I think that this can be reasonably done.

Team Status Report for 10/5/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

The main risks to the project are the complexity of each subsystem. For the music scanning element, due to the diversity and complexity of sheet music, it is possible that the accuracy of this system (which is already a challenge to piece together using the existing open source libraries) is not sufficient to meet the overall needs of our system. The contingency is to use MIDI inputs instead of a PDF which have a well defined library in multiple languages. For the algorithm, the risk is dealing with similar edge cases. Music is inherently subjective but focusing on beginner and intermediate musicians means the algorithm can be more strict to focus on more rigid timing than would be needed for more advanced musicians who already posses the skill and play with tempo (and cause more musical variation difficult to process). Overall, once a basic iteration is developed, it will simply continue to be optimized as the semester continues. Lastly, the audio processing itself seems to be challenged by real-time input. Since real-time DSP is its own challenge, the starting place to mitigate risk is to use pre-recorded audio files as the initial input. Should real-time fail to be successful, musicians could still upload their recordings into the system and receive feedback. It would be an extra step on the part of the user, but still work as a functional system.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

No changes since last week.

Provide an updated schedule if changes have occurred.

No schedule changes.

Our current Gantt Chart: https://docs.google.com/spreadsheets/d/1w5bFU-YbyqIHIdWTXLG7f4z9ygEWPRjl9v7tBey53n8/edit?usp=sharing

Mathias Status Update 10/5

This week I’ve mostly spent trying to polish the sheet music conversion. I had an issue last week where for many of sheet music files after splitting and passing through Mozart the final output would have no output. First step I made in debugging was changing realize that my resizing algorithm had a mistake. To split the sheet music by line I used the character that appears at the start of every measure. To handle the case that every image may not be the perfect size for the template I’m using I resize the image from a scale of .1x to 10x. In the previous implementation of the algorithm we only scaled from .1 to 1 due to a bug. Fixing this bug allowed for various size images to work however it did not fix the Mozart output. To address this I tried to split the sheet music image based on the squiggly character at the beginning of each measure and pass that result into Mozart however that didn’t work as well. The last thing I tried was to resize and sharpen the final image result to a similar size to the one that worked however this approach did not work as well. Due to this I spent some extra time this week lightly checking solutions outside of Mozart as well.

 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Progress is on schedule so far.

 

What deliverables do you hope to complete in the next week?

Most of the time next week will be spent on the design report. Outside of that I’ll spend more time trying to get Mozart to work and if that fails look into a different solution.

 

 

Mathias’ Status Report for 9/28/2024

This week in addition to assisting gathering data from the school of music I spent most of my time working on converting the sheet music to an independent file format. Before I was mentioning how there was an issue where I needed to split the image for the library to work.  I tested that this week and confirmed that the sheet music needed to be split at least by measure for the library to work correctly. I then looked into methods to split a sheet a piece of sheet music by measure anf decided on splitting it based on the special character that appears at the beginning of the sheet music. I wrote a python script that used opencv to check if the special character appears in a sheet music image, highlight the character, then parse based on the position of the character. Initially testing this by passing the result of the parsing into the Mozart library showed somewhat successful results but the edge cases need to still be handled.

 

 

 

 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule.

On schedule

 

What deliverables do you hope to complete in the next week?

I hope to work mostly on the design presentation and design report next week. I also hope to iron out some of the edge cases with the conversion.

 

I also want to look into whether we need to convert to MIDI. I  From the conversion to Mozart’s format we are already given the notes played which should be the key information we would need so I would like to look into whether we could bypass the midi conversion entirely.

Ben’s Status Report for 9/28/2024

This week I continued conversations with Professor Roger Dannenberg regarding the approach to the digital signal processing (DSP) required for the project. We reached the conclusion that a FPGA may unnecessarily complicate the design as many of the existing infrastructure for DSP exists in software-land.

Additionally, I lead a recording session in collaboration with CMU School of Music students to collect our first batch of audio data for the project. From this I generated both answers to some of our qualitative timing metrics and more cases we need to consider or test for.

Visual Versus Audible Onset

In regards to audio signals, there is both a visual/physical onset, and a audible onset. The physical onset is the beginning of an articulation; where the instrument first begins to produce a sound. The audible onset is the point at which the note can first begin to be heard by the human ear during playback. From the initial measurements, the delay from the visual onset to the audible onset for the vocalist was within the 50-100mS range. This is important to consider since music is based solely on the audible perception of its waves.

There are a few approaches in consideration of pitch. The first, Dynamic Time Warping (DTW) attempts to synchronize two unsynchronized audio streams like our use case. This could be used in the production of feedback as it will give an idea of what portions needed to be dynamically changed to resynchronize the two musicians. Fast Fourier Transforms may also need to be used to attempt to isolate pitches and separate chords. These are both best done in software as part of DSP libraries. Research is ongoing.

Something else to notice from our data is that the piano articulations are much more uniform in shape (triangular) relative to the more oblong and noisy vocal track. This confirms our suspicions about which instrument (voice) will be more of a challenge to properly filter and process. An additional consideration is that consecutive notes do not necessarily have a clear separation in the vocal track like they do in the piano track. This will require further research, but motivates an algorithm that listens for the onset of note clusters rather than their ends since that distinction is much more difficult to quantify.

Overall it would seem that we have a bit more a buffer with latency since the onset gap is larger than expected. However, I don’t expect this to be an issue using software as there are only two input steams.

Team Status Report for 9/28/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

The most significant risks that could jeopardize the success of this project are edge cases in the music that can’t be properly analyzed for timing requirement and being unable to register all notes in that are being recorded by the musicians. This can be mitigated by changing the scope of our project by using simpler sheet music and having a more generous error tolerance to account fo those edge cases.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

A major change we made to the existing project design was pivoting from the FPGA to a computer for the audio signal processing. This was done at the guidance of Prof. Dannenberg as he stated using an FPGA would be unnecessarily complex and a computer would still fulfill the necessary latency requirements.  The audio processing is still a substantial task and the switch will not take away complexity from the project.

Provide an updated schedule if changes have occurred.

Due to the pivot, any FPGA related tasks have been changed. The audio processing task list and schedule has been updated as follows:

 

In the team report, please include A was written by Aakash, B was written by Ben and C was written by Mathias.

Part A: Our solution does not apply to considerations of public health, safety or welfare because we are working on a tool that helps musicians with performances. While getting better at performance can help someone’s psychological safety, there are no real physical safety or public health considerations to be made in this regard because we are just giving feedback and not impacting the real world.

Part B: Our solution involves music which is an innately cultural and social element. Our background and approach is from a Western perspective; the main bulk of our testing audio is gathered via musicians with Western Classical training. Though our design is mainly focused on synchronicity, generally a universal trait of all music, there may be some bias towards a specific feedback outcome informed by the style of music we are used to. That being said, the design, at a high level, is intended to aid in the creation of social experiences.

Part C: Our solution may be able to help people in terms of economic factors due to the application being provided being free. Traditionally music coaches can be pricey and although our application isn’t a complete substitute for a coach it can allow people who wouldn’t normally be able to afford one access to musical feedback that they wouldn’t otherwise have.

Aakash’s Status Report for 9/28/2024

This week required a lot of brainstorming and thinking about how we were going to implement our project. I spent a lot of time working on the design presentation after we had an initial meeting with the School of Music musicians and collected some ground truth timing data.

I am ideating how to implement the timing algorithm because after discussing with Prof. Dannenberg it appears that real time feedback may not be feasible and that I should spend more time focusing on the post performance feedback. This means that most of the focus will be on post performance. I have attached my current plan for the timing algorithm below and will continue researching on how to deal with edge cases and complex scenarios with my teammates.

 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Progress is on schedule so far.

What deliverables do you hope to complete in the next week?

For next week the deliverables are going to be completing the design presentation and beginning to build out a basic timing algorithm prototype.

Ben’s Status Report for 9/21/24

This week, I spent much of my time researching the gaps in my knowledge regarding the planned hardware and audio processing we require for our project. I watched multiple hours of high-level discussion of similar real-time audio processing projects. I learned about audio codecs and their utilization of ADCs. Finally, I contacted field experts after generating a list of technical questions from my learning. The hope is that with some conversation, I can be pointed in the right direction in terms of how to go about the audio processing and gain a better idea of what is feasible, given our time frame. Looking forward to next week, I would like to finish the high-level design of the hardware data path and have a specific FPGA selected for the project to maintain the schedule we have planned.

Team Status Report for 9/21/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

The most significant risks that jeopardize the success of the project is having the timing algorithm not work correctly. This risk is being managed by collecting a lot of music audio to ensure we can have a working algorithm for the demo. A contingency plan is to limit the scope of the project to music that is easier to break down if we find having support for all music to be too complex.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

There are no changes to the design of the system at this time.

Provide an updated schedule if changes have occurred.

No schedule changes at this time.

Mathias’ Status Report for 9/21/2024

This week assisted in preparing for the project presentation. For the project I was assigned to work on the sheet music scanning as well as the web app. I did some research on other methods to convert sheet music outside of the previously suggested Mozart library and starting on setting up a skeleton for the web application. I tested the Mozart library with sheet random sheet music images from the internet to check if it would work with arbitrary sheet music however the library produced empty outputs for these files. I looked into other methods to convert sheet music but most of them were not maintained well or dont do what I need i music21. Luckily in the github issues for Mozart there is a explanation saying that its necessary to split the image into multiple sections so I’ll be looking into that later. For the web app I got a sample file upload endpoint implemented

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule.
On Schedule

What deliverables do you hope to complete in the next week?

I hope to finalize my research into the sheet music scanning and have working stub endpoints the the web application.