Team’s Status Report for 12/9

This week saw the a lot of progress across our subsystems. Specifically, we have made significant progress in the integration of our subsystems. Additionally, we worked on making our eye-tracking heuristic and audio alignment model more robust.

Upon seeing that our cursor measurements were not as accurate as we would have liked, we continued iterating on our audio alignment algorithm. In particular, we tested and changed our backend algorithm in attempts to achieve a lower latency. When testing, we realized that we recorded our latency wrong the first time. Now, our revised system for audio alignment on audio recordings of 50 frames compiles and gives us the expected output within 20ms, which is much higher than we thought possible!

These are the tests we’ve conducted so far. We may add tests as we add the last few features.

  1. Audio Alignment w cursor < 1 bar
  2. Audio robustness for missed notes
  3. Audio robustness for wrong notes
  4. Audio robustness for time skipping
  5. Quiet environment tests, audio backend @60, 90, 120, 150, 180, 210, 240 BPM
  6. Noisy + metronome environment tests, audio backend @60, 90, 120, 150, 180, 210, 240 BPM
  7. SNR
  8. Eye tracking accuracy
  9. Eye tracking latency
  10. Eye tracking overrides
  11. Head tracking overrides
  12. Page flipping accuracy (audio, eye, audio + eye)

The tests revealed a lot of information to us which we used to continue adapting our scripts to. In particular, we needed audio alignment to be faster and more robust. Unfortunately, dynamic time warping took too long, inspiring us to write our own MidiAlign algorithm. This function uses a two pointer approach to align the recorded snippet to the reference MIDI file. We used the confidence of various possibilities of matched sequences to determine where the user is currently playing as well as the number of missed notes in the sequence. Therefore, even if the user plays a wrong note, the function will not align to a drastically different section of the piece.

Another change we made was a step towards fixing our harmonic problem. We’re now using sound dampening in order to reduce the interference from the 1.5x frequency of resonant violin notes.

Overall, some of our biggest findings include that our system improves upon current solutions because of its ability to align music in a non linear fashion; users can go back to a previous section and our algorithm will still align them accurately! The system is novel, and achieves its goals well. We are very proud of our project this semester, and are excited to present it at the demos in the upcoming week!

Sanjana’s Status Report for 12/9

This week, I made a lot of progress. I continued working on the audio alignment algorithm with Rohan and Caleb. I also conducted several rounds of tests. We continued to test the audio alignment subsystem and the eye tracking systems separately.

Thanks to Professor Jim Bain, we also discovered the issue that was hampering our audio reception. We continued to run into the harmonic problem where the mic picks up audios at 1.5x the frequency of the actual note. We tested our system on Prof. Bain’s vocals for the first time and discovered that the harmonic problem was unique to the resonant violin and decided to dampen the sound with a mute. This solution, however, is imperfect. In the future, I would like to buy a practice mute so I can more evenly dampen all the strings instead of prioritizing only two strings to dampen with my current mute.

There were some changes and attempts made in the audio alignment code as well to get better accuracy with our cursor. We considered implementing rest tracking (rests are musical structures without any frequency being generated, in other words, silence). We ultimately decided against this so users would only get a moving cursor as they’re playing sections with mostly notes.

Finally, we took measurements for latency in our system again and got an updated value even lower than our presented value of 158ms. The librosa.load calls were being made in the wrong place, and therefore were getting included in our latency calculations. Now that we removed these extra calls, the system is aligning audio segments of 50 frames in under 20ms.

I also worked hard on the Final Report (stay tuned!). That’s coming along nicely, I’m just waiting on the results section so we can add the most updated results there as we are continuing to optimize the algorithms up until the end of this week. Overall, our project is almost complete and I am so excited to present it at the demos we have this week!

Rohan’s Status Report for 12/9

This week my team focused again on integration of the three systems: Front-End, Eye-Tracking, and Audio Alignment. Mainly, we worked on making our audio alignment update the cursor in a more robust manner. This week my team did not really work separately. We all worked on improving audio alignment and eye tracking together during our work sessions this week.

For audio alignment, we worked on reducing our alignment computational latency and reducing overall audio alignment latency. There was a problem where the computing chroma matrix for the first audio sample had too high of a latency, around 1.5 seconds. We wanted our overall audio latency to be under 500 milliseconds. However, the computations that occur for the remaining audio samples averaged around 20 milliseconds. To solve this problem, we called the function before we do the first costly computation before the audio alignment starts to ensure we never experience this latency during the user’s performance.

After that, Caleb, Sanjana, and I worked trying to make the cursor update more smoothly and quicker based off solely the audio alignment. To do this we made sure our audio alignment picks the best matching subset of notes detected when comparing them to all notes in the MIDI file. This required us to rewrite some of our existing algorithm for audio alignment. I also worked on improving the eye-tracking heuristic model with Sanjana and Caleb. We essentially made the interpolation more robust by sending the duration of the user looking at the line of music they are currently playing to help the audio alignment better decide where the user is located.

So far, my progress is on schedule. Until the next 2 demos we have quite some work to finish up to make the system more aesthetically pleasing and robust.

 

Caleb’s Status Report for 12/9

This week I worked on improving the robustness of audio alignment. Unfortunately, dynamic time warping the whole reference audio to the live audio recorded by the player took too long. Therefore, a significant amount of time was spent on implementing a function we call MidiAlign. This function takes in the chroma vectors of the live audio and scans it for long durations of harmonic frequencies. This list of harmonic notes is then referenced against the reference MIDI file to find all instances of where the sequence of notes occur. To choose a instance in the reference MIDI to align to, the confidence of each possibility is weighted using the distance from where the user is currently playing as well as the number of missed notes in the sequence. Therefore, even if the user plays a wrong note, the function will not align to a drastically different section of the piece.

Another point of difficulty was dealing with latency from various points in the system. For example, librosa is a Python library that processes the audio into audio frames and also computes chroma vectors. However, this function on the first call runs caching in the background that causes the delay to rise from 20ms to 900ms. This caused our first audio alignment to lag the system and lead to undefined behavior. This was simply fixed by causing the first librosa call to occur during setup. Another point of latency was the constant call to update the webpage. This call to update the variables for the webpage was originally made every 20ms. However, this led to the system lagging. We upped this value to 50ms to give more time for the backend to process while still keeping the frontend cursor moving smoothly.

This upcoming week is the final demos. Therefore, we hope to create a demo where the whole system works along with several other modes that demonstrate the individual subsystems. Unfortunately, because eye-tracking and audio alignment are weighted together to determine the single page turn, it is hard to notice the individual contribution from each subsystem. We hope to have a mode where how eye tracking works is obvious and a mode where just audio alignment is used to turn the page. This will help the audience better understand how the system as a whole works.

Overall, we are mostly on track and will continue to work to create an enjoyable demo for the exhibition.

Team Status Report for 12/2

This week saw the most progress across our subsystems. On the frontend, we now have complete functionality that is just waiting for a few last integration touches from the backend. Right now, we are able to update the cursor and flip pages in real time with the audio subsystem and the eye tracking subsystem separately sending AJAX messages from the backend.

Some challenges we overcame this week involved the loss of Caleb’s laptop functionality. While debugging, we overwrote the user path name and restarted the computer, which made it confused and couldn’t load his drives. In order to recover from this issue, we moved our system to running on Rohan’s laptop. This involved several hours of installing dependencies, rebuilding environments, and fixing bugs. Finally, we got the real time system working on Rohan’s computer including audio and eye tracking.

Substantial improvements were made to the eye tracking subsystem this week. The eye tracking subsystem is able to accurately calculate when a user is nearing the end of a page and aptly flips the page as a function of the tempo. We also worked on adding a predictive element using the time difference of playing through a line. This uses linear extrapolation to predict where the user is located in the music more accurately than just the page flip logic.

The audio subsystem has become more accurate, with more advanced microphone filtering and clear chroma vector outputs from a real test audio. In addition, we ran into several problems with uploading and using MIDI files on Rohan’s computer, however, we fixed that and are able to receive sound outputs in MIDI form and decipher their information.

All in, it’s been a grind this week, but the finish line is in sight. We will continue working hard to reach our goals and deliver an amazing system.

Sanjana’s Status Report for 12/2

This week witnessed significant strides in the progress of each subsystem. The frontend has achieved full functionality, seamlessly interfacing with the backend to receive updates and dynamically present a real-time cursor through the utilization of Asynchronous JavaScript and XML (AJAX). Using AJAX marked a pivotal breakthrough, because alternatives like web sockets  proved incompatible with our codebase. Had AJAX not proven successful, I would have had to refactor the entire codebase to use React and state hooks for the real-time component.

We encountered a huge challenge this week when the PvRecorder library had a native sampling rate of 16000 Hz and we wanted to sample the audio at 48100 Hz. This resulted in over 8 hours of changing and debugging the source code files of PvRecorder. Without any luck, we resorted to other methods of fixing the bugs we saw, and happened to wipe Caleb’s laptop. With this setback, we had to dedicate over a day recovering code, files, reconstructing environments, and installing dependencies. Despite this challenge, we have worked many many hours and gained back the lost progress. The audio subsystem has less noisy chroma vectors without the fifth harmonic interference issues we dealt with last week. This was achieved by adjusting the microphone settings some more and not the sampling rate. To continue developing the subsystems without Caleb’s laptop, we have installed several programs for running the audio subsystem and the frontend completely from Rohan’s computer as Sanjana’s mac does not support the Tobii Eye Tracker. Although operational challenges come with the use of two distinct laptops, we concluded that maintaining two fully functional systems is the optimal approach following this week’s unfortunate incident.

I also assisted Rohan on the eye-tracking subsystem. We made several updates to the algorithm that allow for more functionality and perhaps could be a standalone subsystem with a little more work and testing. In addition to the override buttons from the previous week, we implemented line tracking to determine which line a user is looking at. From there, we have linearly extrapolated the predicted cursor speed using the last recorded speed of eye movement across a line. This information is being tested to see if the cursor can also be updated with just the eye tracking subsystem.

In summary, this week witnessed an exhaustive commitment of hours, driving the project towards its conclusion with the successful integration of diverse subsystems.

Rohan’s Status Report for 12/2

This week my team and I worked on the integration of our subsystems: Eye-Tracking, Audio Alignment, and Front-End. We also performed some quantitative tests for our system, while also working on the Final Presentation.

I personally worked on making the Eye-Tracking Heuristic more robust. For example, I worked on trying to add eye-gaze extrapolation to track the duration of the user’s eyes looking at each line to help with audio alignment. I also made all eye-tracking functions integrated with the Front-End. In others, when the user looks at the override buttons or the last couple of measures the page turns, very consistently. Additionally, I worked with Caleb and Sanjana on making the audio alignment work better and more robust by parsing through and filtering the chroma vectors of an audio stream. This took a lot of effort because of complex the signal processing is, and the multitude of problems and noise to deal with. Lastly, I helped Sanjana prepare for her presentation.

So far, my progress is on schedule, but for the remainder of the semester there is a lot of work left. Next week, I plan to the final steps of integration and more testing and validation for the last part of Capstone.

 

Caleb’s Status Report for 12/2

This week I integrated the audio alignment into the backend and tried numerous different methods to improve audio quality. The initial problem that we had was that instead of detecting the actual note being played, we were detecting the note that is a 5th above or 1.5x the frequency. We attempted to fix this by changing the pvrecorder API that recorded the audio to increase the sampling rate and achieve better audio quality. However, to do this, it required rebuilding the API and creating a new dll file. In trying to do this, we accidently changed some key computer settings on my laptop and wiped the drive that it was running off of. Therefore, we shifted the entire project to Rohan’s laptop and setup all the necessary environments all over again. After making this shift, instead of trying to change the pvrecorder API, we instead tested different locations for the microphone and found a location right near the bridge of the violin that greatly improved the audio quality. Although this location still gives the 5th occasionally, it does so far less frequently than other locations we tested. For instance, we tested right on the player’s shirt collar, however, any movement by the player caused the microphone to move and to pick up a static scraping sound as the microphone brushed against the fabric.

Another problem came in the implementation of the backend. It turns out, in Django, no multiprocessing in Python can be used. This was not known until the entire backend was parallelized. This parallelism was very important for hitting timing use-case requirements and may now only be doable in a subprocess call instead of directly in the Django backend. There is also another api called celery that may make this doable, but we are still unfamiliar with this api.

This upcoming week, I will be looking at the robustness of the MIDI align algorithm as well as how to implement it efficiently in the backend. To do this, we will continue to test with Sanjana playing and see if we can accurately place the cursor in roughly the correct position and if the other audio alignment algorithms such as DTW will improve the MIDI align algorithm.

We are currently still on track and will continue to work hard to stay on track.