Team’s Status Report for 12/9

This week saw the a lot of progress across our subsystems. Specifically, we have made significant progress in the integration of our subsystems. Additionally, we worked on making our eye-tracking heuristic and audio alignment model more robust.

Upon seeing that our cursor measurements were not as accurate as we would have liked, we continued iterating on our audio alignment algorithm. In particular, we tested and changed our backend algorithm in attempts to achieve a lower latency. When testing, we realized that we recorded our latency wrong the first time. Now, our revised system for audio alignment on audio recordings of 50 frames compiles and gives us the expected output within 20ms, which is much higher than we thought possible!

These are the tests we’ve conducted so far. We may add tests as we add the last few features.

  1. Audio Alignment w cursor < 1 bar
  2. Audio robustness for missed notes
  3. Audio robustness for wrong notes
  4. Audio robustness for time skipping
  5. Quiet environment tests, audio backend @60, 90, 120, 150, 180, 210, 240 BPM
  6. Noisy + metronome environment tests, audio backend @60, 90, 120, 150, 180, 210, 240 BPM
  7. SNR
  8. Eye tracking accuracy
  9. Eye tracking latency
  10. Eye tracking overrides
  11. Head tracking overrides
  12. Page flipping accuracy (audio, eye, audio + eye)

The tests revealed a lot of information to us which we used to continue adapting our scripts to. In particular, we needed audio alignment to be faster and more robust. Unfortunately, dynamic time warping took too long, inspiring us to write our own MidiAlign algorithm. This function uses a two pointer approach to align the recorded snippet to the reference MIDI file. We used the confidence of various possibilities of matched sequences to determine where the user is currently playing as well as the number of missed notes in the sequence. Therefore, even if the user plays a wrong note, the function will not align to a drastically different section of the piece.

Another change we made was a step towards fixing our harmonic problem. We’re now using sound dampening in order to reduce the interference from the 1.5x frequency of resonant violin notes.

Overall, some of our biggest findings include that our system improves upon current solutions because of its ability to align music in a non linear fashion; users can go back to a previous section and our algorithm will still align them accurately! The system is novel, and achieves its goals well. We are very proud of our project this semester, and are excited to present it at the demos in the upcoming week!

Sanjana’s Status Report for 12/9

This week, I made a lot of progress. I continued working on the audio alignment algorithm with Rohan and Caleb. I also conducted several rounds of tests. We continued to test the audio alignment subsystem and the eye tracking systems separately.

Thanks to Professor Jim Bain, we also discovered the issue that was hampering our audio reception. We continued to run into the harmonic problem where the mic picks up audios at 1.5x the frequency of the actual note. We tested our system on Prof. Bain’s vocals for the first time and discovered that the harmonic problem was unique to the resonant violin and decided to dampen the sound with a mute. This solution, however, is imperfect. In the future, I would like to buy a practice mute so I can more evenly dampen all the strings instead of prioritizing only two strings to dampen with my current mute.

There were some changes and attempts made in the audio alignment code as well to get better accuracy with our cursor. We considered implementing rest tracking (rests are musical structures without any frequency being generated, in other words, silence). We ultimately decided against this so users would only get a moving cursor as they’re playing sections with mostly notes.

Finally, we took measurements for latency in our system again and got an updated value even lower than our presented value of 158ms. The librosa.load calls were being made in the wrong place, and therefore were getting included in our latency calculations. Now that we removed these extra calls, the system is aligning audio segments of 50 frames in under 20ms.

I also worked hard on the Final Report (stay tuned!). That’s coming along nicely, I’m just waiting on the results section so we can add the most updated results there as we are continuing to optimize the algorithms up until the end of this week. Overall, our project is almost complete and I am so excited to present it at the demos we have this week!

Rohan’s Status Report for 12/9

This week my team focused again on integration of the three systems: Front-End, Eye-Tracking, and Audio Alignment. Mainly, we worked on making our audio alignment update the cursor in a more robust manner. This week my team did not really work separately. We all worked on improving audio alignment and eye tracking together during our work sessions this week.

For audio alignment, we worked on reducing our alignment computational latency and reducing overall audio alignment latency. There was a problem where the computing chroma matrix for the first audio sample had too high of a latency, around 1.5 seconds. We wanted our overall audio latency to be under 500 milliseconds. However, the computations that occur for the remaining audio samples averaged around 20 milliseconds. To solve this problem, we called the function before we do the first costly computation before the audio alignment starts to ensure we never experience this latency during the user’s performance.

After that, Caleb, Sanjana, and I worked trying to make the cursor update more smoothly and quicker based off solely the audio alignment. To do this we made sure our audio alignment picks the best matching subset of notes detected when comparing them to all notes in the MIDI file. This required us to rewrite some of our existing algorithm for audio alignment. I also worked on improving the eye-tracking heuristic model with Sanjana and Caleb. We essentially made the interpolation more robust by sending the duration of the user looking at the line of music they are currently playing to help the audio alignment better decide where the user is located.

So far, my progress is on schedule. Until the next 2 demos we have quite some work to finish up to make the system more aesthetically pleasing and robust.

 

Caleb’s Status Report for 12/9

This week I worked on improving the robustness of audio alignment. Unfortunately, dynamic time warping the whole reference audio to the live audio recorded by the player took too long. Therefore, a significant amount of time was spent on implementing a function we call MidiAlign. This function takes in the chroma vectors of the live audio and scans it for long durations of harmonic frequencies. This list of harmonic notes is then referenced against the reference MIDI file to find all instances of where the sequence of notes occur. To choose a instance in the reference MIDI to align to, the confidence of each possibility is weighted using the distance from where the user is currently playing as well as the number of missed notes in the sequence. Therefore, even if the user plays a wrong note, the function will not align to a drastically different section of the piece.

Another point of difficulty was dealing with latency from various points in the system. For example, librosa is a Python library that processes the audio into audio frames and also computes chroma vectors. However, this function on the first call runs caching in the background that causes the delay to rise from 20ms to 900ms. This caused our first audio alignment to lag the system and lead to undefined behavior. This was simply fixed by causing the first librosa call to occur during setup. Another point of latency was the constant call to update the webpage. This call to update the variables for the webpage was originally made every 20ms. However, this led to the system lagging. We upped this value to 50ms to give more time for the backend to process while still keeping the frontend cursor moving smoothly.

This upcoming week is the final demos. Therefore, we hope to create a demo where the whole system works along with several other modes that demonstrate the individual subsystems. Unfortunately, because eye-tracking and audio alignment are weighted together to determine the single page turn, it is hard to notice the individual contribution from each subsystem. We hope to have a mode where how eye tracking works is obvious and a mode where just audio alignment is used to turn the page. This will help the audience better understand how the system as a whole works.

Overall, we are mostly on track and will continue to work to create an enjoyable demo for the exhibition.

Team Status Report for 10/21

Over the course of two weeks, we worked on our Design Report, met with Dr. Roger Dannenberg, and began initial setup of the Google Board. After discussing Dr. Dannenberg’s past work in audio alignment, we realized that audio alignment can be incredibly more robust than the use of eye-tracking. Dr. Dannenberg worked on a project in the 1980s where lines of sheet music would display on a computer screen to match the user was currently playing. There was no page turning here, but it is pretty similar to what we are trying to achieve. In this project, Dr. Dannenberg only used audio alignment to figure out where the user is currently located and to display the correct corresponding lines of music. Dr. Dannenberg showed us an old video demonstration of his system, and it was incredibly accurate. This showed us that the addition of eye-tracking could or could not have great significance on our system performance. This new information potentially changes the use of Eye-Tracking. For example, we planned to use head-tracking for override cases, such as the user turning their head to the right to indicate to flip the page to the right. However, it is too early to foresee how impactful the Eye-Tracking component will be. We still plan to proceed with implementing Eye-Tracking for our system just to measure how much improvement there will be in the  system performance.

Currently, there are no changes that have been made for the design. However, our conversation with Dr. Dannenberg gave us confidence in using audio alignment as our main form of tracking the user’s performance.  Additionally, we finally got our hands on a Google Coral Dev Board, and we have successfully flashed this board.

Sanjana’s Status Report for 10/21

This week, I worked with my team to flash the Google Board. We followed the directions on this site, however we are behind schedule in terms of getting the board operating with some basic functionality. We weren’t able to get the board connected to Wi-Fi, therefore we’re looking into a couple options: continue researching Wi-Fi connectivity and how feasible that is, or connect via the Google Board’s ethernet port. My main concern about an ethernet port is that it could violate the use case requirement of portability and simple hardware setup.

The majority of my hours worked this week went into preparing the Design Report on time. The Design Report involved a lot of additional research and helped us better understand the scope and technical requirements of our project.

Overall, progress is slightly behind schedule. I am planning on catching up this week by working on the frontend experience flow and figuring out how the pages will be navigated locally. These deliverables should allow me flexibility in terms of integrating the audio and visual components into the display over the next couple of weeks. Furthermore, I will be working more with the Google Board to help setup internet connection and a GitHub repo with all our code.

In order to implement my portion of this project, I’m looking into learning some new tools. The new tools I need to study are Python display libraries: PyQt5 and Tkinter being the top 2. I intend to follow some tutorials to better understand the advantages and disadvantages of each approach and then finish implementation. One roadblock I’m facing is file organization. There are several sub-systems in SoundSync – frontend, backend, audio alignment, and eye tracker processing. We have some Google Board starter code and will be running into integration issues and compatibility between different Python libraries in the near future – it’ll be my job to debug those.

Rohan’s Status Report for 10/21

The past two weeks involved writing the Design Report, MIDI audio alignment, and setting up the Google Coral Dev Board. For the Design Report, I worked on writing the introduction, some of the Use-Case Requirements, some of the Design Requirements, some of the Trade-Off Sections, the Gantt Chart, and  the Summary. After finishing the Design Report as a team, we started work on the Google Coral Dev Board. I aided in the effort to flash the board, and test some rudimentary programs for the board. On the side, I looked into MIDI audio alignment programs to work on the audio alignment of our system.

For the MIDI audio alignment, I looked into possible tools I could use to learn how to properly implement Dynamic Time Warping with a MIDI File. I looked into this website: https://www.musanim.com/wavalign/ , for some guidance. This website talked about using FFTW, which is a C subroutine library for computing the discrete Fourier transform in one or more dimensions, of arbitrary input size, and of both real and complex data. I also looked at PyAudio tools and dtw-python, which is a pyhton library for Dynamic Time Warping. For setting up the Google Coral Board, I mainly looked at the Google Board Dev Starter website and documentation: https://coral.ai/docs/dev-board/get-started/.

So far, my progress is on schedule, and next week I plan to finish setting up the Google Coral Dev Board, and test some of the on-boarding features.

Caleb’s Status Report for 10/21

This week I spent time researching how to implement the given audio functions in Python. More importantly, the python functions must be able to run completely remotely on the Google Coral board. Setting up the board came with some complications which will be discussed later in this post. Implementing these functions for the Google Board means the board must have all the data necessary when the function is called. This means ensuring the segment of live audio is already segmented and stored on the Google board. This turns out to be more tricky than anticipated simply because the board needs an easy way to pull information from all the different ports connected to it.

Setting up the Google board was the major unforeseen complication as the instructions listed upon the Google website for setting up did not seem to work too well. One example is connecting to wifi. Unfortunately, CMU’s wifi login requires a username and password. The board expects for the given wifi to work by simply connecting and inserting the correct password. Therefore, this eliminates any CMU wifi which requires a login. CMU does have password free wifis. However, these wifis have additional security in place which seemed to identify our Google Board as performing unsafe actions and shut off internet connection. This made pip installing or checking for updates impossible using CMU wifi. Furthermore, any hotspot from a mobile device was not detected by the board.

One of the tools I’m looking to learn is how to program for the Tobii Eye Tracker 5. This camera has capabilities spanning from head tracking to precise eye tracking. Understanding how to not only extract this information but also be able to relay it back to the Google Board is an important and challenging task. I also am looking to learn how to create uniform sheet music in Musescore. Musescore is a music writing application which does give the user the ability to customize spacing and notes in a score. However, I still need to learn how to utilize all the knobs to create sheet music that is both readable and uniform.

This upcoming week, I look forward to using sound samples collected through Dr. Dueck’s class to perform various audio filters. I am most interested in seeing if after performing a harmonic-percussive separation if breathing is separated from other percussive sounds and can be detected that way. I also am looking into setting up the Tobii camera so that it is compatible with the google board and so there is communication between the two.

We are currently still on track and will continue to work hard to stay on track.

Team Status Report for 9/23

This week, we researched more parts for our power requirements of our system.  We looked into USBC Male to Female Jumper cables for our system and looked at different battery packs with different power budgets.

As we continued researching, several risks emerged. Our original design planned to account for tempos up to 180 BPM. Feedback from instructors, however, indicated that it may be overly ambitious to attempt to build a completely robust audio filtering system at 180 BPM. 

Our system design and priorities are also evolving. Since eye tracking will serve as the foundation of our page turning, adding another foundational technology – like audio alignment – to our system may be delegated to post-MVP additions. We are continuing to look into the audio alignment portion of our system, and plan to decide before next week whether this design change is one we want to follow through with.

This upcoming Monday, we are meeting with Dr. Dueck who has taken a keen interest in our project. We intend to discuss our proposed design, gain a better perspective regarding our use case, and understand how we will be collaborating with her.

In terms of welfare, we purposely designed this system to be fully operational with just the user’s eyes. This allows people who cannot operate a foot pedal, like those who may be paralyzed below the waist, to also not have to deal with flipping a page during a performance. Although the foot pedal system is cheap and simple, this simplicity ultimately excludes a percentage of musicians, which we view as unfair. Therefore, the goal of our system is to include those who were excluded by a solution that failed to account for a significant percentage of musicians.

Sanjana’s Status Report for 9/23

This week, I researched platforms for hosting our UI and looked into Tobii Eye Tracker 5 integration and setup.

We intended to use the Tobii Eye Tracker 5 Camera for eye tracking on a digital page, however I discovered some issues that may arise with integration. This eye tracker model isn’t compatible with MacOS, which is my development OS. I looked into 4 other solutions:

  1. Tobii Pro SDK can give us raw data and is mac compatible, but is incompatible with the Tobii eye tracker 5. All compatible devices aren’t in budget.
  2. The iPad true depth camera. I found an open source project that investigated eye tracking using the TrueDepth camera on an iPhone X. This will be challenging because I don’t have experience with Swift programming, and we are limited by tablets that already have a TrueDepth camera. This eye tracking may not be robust enough for our intended use, however, this solution remains accessible to musicians who can’t operate a foot pedal during performances.
  3. We can pivot to face tracking rather than eye tracking. Now, several open source APIs become available, and I can build out a real-time iOS app in Swift. The potential problem with this approach is that users will have to use their head to turn the page, which could result in a loss of focus. However, this solution maintains accessibility.
  4. The eye tracking community recommends Talon Voice’s software support for MacOS development. Here, there is good support for eye tracking, and head tracking is used to further refine miscalculations.

Regarding the UI, there are a couple of options: native iPad apps, or building and deploying a web application on a tablet. Mobile/native apps can provide benefits such as being faster and more efficient, which could have a huge impact on our real time system. Ultimately, this decision rests on which platform we collect eye data from.

Progress is on track as of now. For next week, I hope to finalize the tech stack and parts list I’ll be using and begin implementation of the UI.