clille – Team B6: SoundSync

October 7, 2023October 25, 2023

Team Status Report for 10/7

Although we have a final design that was presented in the design review presentation, we have some concerns about the quality of our system. We obviously can’t know the performance of dynamic time warping (DTW) or the model until they are built. However, there is the risk that even for tempos that are quite modest such as 120bpm, the processing delay or DTW takes more than 500ms. Importantly, both the ML model and DTW have to stay below 500ms. This is because the ML model, which is associated with eye-tracking, and DTW, which is associated with audio alignment, run in parallel. This might need to be mitigated by reducing the max tempo of the pieces that can be aligned which also increases the maximum delay for both DTW and the ML model. This would, in turn, limit the scope to just beginner music instead of beginner and intermediate-level music.

Currently, no changes have been made to the design. However, we are meeting with Dr. Dannenberg who is an expert in the field and has done projects similar to one we are looking to create. We hope this conversation gives us insight into problems we can’t foresee and we will adjust the design appropriately.

Several principles of engineering implemented for this project include user-centered design, robustness, and power management. User-centered design is a main focus because we aim to include a wide range of musicians. Robustness is an important principle as we want to handle all the variations in user inputs. Power management is another principle that keeps the system operable for the duration of a rehearsal and does not inconvenience the user.

October 7, 2023October 25, 2023

Sanjana’s Status Report for 10/7

I began coding a user interface in Python with basic functionality – to upload a sheet music file/MIDI file and pick an instrument. This will later be integrated with the backend. My team and I spent considerable time on the design review report as well. We also ordered parts and read several of Dr. Dannenberg’s past papers. One major technical takeaway from this was on the subject on spectral analysis and chroma vectors. One paper discussed how chroma vectors reduce detailed spectra to 12-element vectors, each representing energy associated with one of the 12 pitch classes (there are 7 notes and 5 unique half steps between notes). Comparing these chroma vectors provides a robust distance metric for score alignment, which is one of our biggest challenges. I also looked into how segmentation, along with state graphs and penalty functions, could be used to handle complex realtime audio data.

With the exception of eye tracking, I am on schedule. The eye tracker was ordered this week, so I expect to begin collecting data from it in the next 2 weeks. As individual components are completed, they will be continuously integrated with my frontend.

My deliverables for next week are a frontend that’s able to display some data from either the microphone or camera.

October 7, 2023October 8, 2023

Rohan’s Status Report for 10/7

This week started with helping Caleb prepare for the Design Review Presentation. This involved finishing up the slides for the presentation, and creating a thorough and informative block diagram. After Caleb presented, we started working on the Design Report. I wrote outlines and subsections for the Architecture and Principles of Operation section, Use-Case Requirements section, and the Design Requirements section.

On the technical side, I looked at more research papers regarding Dynamic Time Warping, specifically papers written by Dr. Roger Dannenberg. One paper that I looked very closely at is his paper regarding polyphonic audio alignment: Polyphonic Audio Matching for Score Following and
Intelligent Audio Editors (https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5fd3ed7924505b35f14dbc1bad99155ae97e8655). Here, Dr. Dannenberg explored using Chroma Vector Analysis for differentiating different pitches among different instruments playing at once. This is something that I want to explore more because this could make our audio alignment program more robust. My team and I will be meeting Dr. Dannenberg on Monday, to ask more about his Chroma Vector implementation for audio alignment.

Additionally, I worked on the frequency filtering that we will be using for filtering our signal so that it is more smooth and has less noise, but also for the instrument filtering. I looked at MATLAB documentation of various digital filters that can be used for frequency filtering. https://www.mathworks.com/help/signal/ug/practical-introduction-to-digital-filter-design.html

So far, my progress is on schedule, and next week I plan to look more into DTW and frequency filtering.

October 7, 2023October 7, 2023

Caleb’s Status Report for 10/7

This week I spent time rehearsing for the design review presentation and ensuring my presentation was clear, concise, and coherent. The presentation came with feedback on points I missed and points that helped keep the audience’s attention.

On the technical side, I spent the week working on implementing Dynamic Time Warping (DTW). This is just a rough sketch of the function because the function needs to be adjusted based on whether the input is a MIDI file or list of points or numpy array. Then the function can also be improved by adding chroma vectoring which involves inputting vectors of length 12 where each dimension corresponds to a pitch (i.e C, C#, D, etc). In other words, the input becomes a vector containing how much of each pitch was present. This is important if the player plays more than one note at a time called an interval or a chord. This may be able to be further improved by changing the input to a 24-input vector, allowing the input to span across more than two octaves. However, this may lead to more latency and compromise a use-case requirement.

This upcoming week, we look forward to talking to Roger Dannenberg, the creator of Audacity. His expertise in computer music and music analysis will be really helpful for our project. Not to mention, Dr. Dannenberg has built a system similar to ours that used only audio, so we aim to pick his brain on how he was able to accomplish this. We also wonder what he thinks about the ability of eye-tracking to improve the rate at which page flips feel comfortable. For personal goals, I aim to test out frequency filtering and how much this affects the ability to align to MIDI files where those high frequencies are not filtered.

We are currently still on track and will continue to work hard to stay on track.