Team’s Status Report for 12/9

This week saw the a lot of progress across our subsystems. Specifically, we have made significant progress in the integration of our subsystems. Additionally, we worked on making our eye-tracking heuristic and audio alignment model more robust.

Upon seeing that our cursor measurements were not as accurate as we would have liked, we continued iterating on our audio alignment algorithm. In particular, we tested and changed our backend algorithm in attempts to achieve a lower latency. When testing, we realized that we recorded our latency wrong the first time. Now, our revised system for audio alignment on audio recordings of 50 frames compiles and gives us the expected output within 20ms, which is much higher than we thought possible!

These are the tests we’ve conducted so far. We may add tests as we add the last few features.

  1. Audio Alignment w cursor < 1 bar
  2. Audio robustness for missed notes
  3. Audio robustness for wrong notes
  4. Audio robustness for time skipping
  5. Quiet environment tests, audio backend @60, 90, 120, 150, 180, 210, 240 BPM
  6. Noisy + metronome environment tests, audio backend @60, 90, 120, 150, 180, 210, 240 BPM
  7. SNR
  8. Eye tracking accuracy
  9. Eye tracking latency
  10. Eye tracking overrides
  11. Head tracking overrides
  12. Page flipping accuracy (audio, eye, audio + eye)

The tests revealed a lot of information to us which we used to continue adapting our scripts to. In particular, we needed audio alignment to be faster and more robust. Unfortunately, dynamic time warping took too long, inspiring us to write our own MidiAlign algorithm. This function uses a two pointer approach to align the recorded snippet to the reference MIDI file. We used the confidence of various possibilities of matched sequences to determine where the user is currently playing as well as the number of missed notes in the sequence. Therefore, even if the user plays a wrong note, the function will not align to a drastically different section of the piece.

Another change we made was a step towards fixing our harmonic problem. We’re now using sound dampening in order to reduce the interference from the 1.5x frequency of resonant violin notes.

Overall, some of our biggest findings include that our system improves upon current solutions because of its ability to align music in a non linear fashion; users can go back to a previous section and our algorithm will still align them accurately! The system is novel, and achieves its goals well. We are very proud of our project this semester, and are excited to present it at the demos in the upcoming week!

Sanjana’s Status Report for 12/9

This week, I made a lot of progress. I continued working on the audio alignment algorithm with Rohan and Caleb. I also conducted several rounds of tests. We continued to test the audio alignment subsystem and the eye tracking systems separately.

Thanks to Professor Jim Bain, we also discovered the issue that was hampering our audio reception. We continued to run into the harmonic problem where the mic picks up audios at 1.5x the frequency of the actual note. We tested our system on Prof. Bain’s vocals for the first time and discovered that the harmonic problem was unique to the resonant violin and decided to dampen the sound with a mute. This solution, however, is imperfect. In the future, I would like to buy a practice mute so I can more evenly dampen all the strings instead of prioritizing only two strings to dampen with my current mute.

There were some changes and attempts made in the audio alignment code as well to get better accuracy with our cursor. We considered implementing rest tracking (rests are musical structures without any frequency being generated, in other words, silence). We ultimately decided against this so users would only get a moving cursor as they’re playing sections with mostly notes.

Finally, we took measurements for latency in our system again and got an updated value even lower than our presented value of 158ms. The librosa.load calls were being made in the wrong place, and therefore were getting included in our latency calculations. Now that we removed these extra calls, the system is aligning audio segments of 50 frames in under 20ms.

I also worked hard on the Final Report (stay tuned!). That’s coming along nicely, I’m just waiting on the results section so we can add the most updated results there as we are continuing to optimize the algorithms up until the end of this week. Overall, our project is almost complete and I am so excited to present it at the demos we have this week!

Rohan’s Status Report for 12/9

This week my team focused again on integration of the three systems: Front-End, Eye-Tracking, and Audio Alignment. Mainly, we worked on making our audio alignment update the cursor in a more robust manner. This week my team did not really work separately. We all worked on improving audio alignment and eye tracking together during our work sessions this week.

For audio alignment, we worked on reducing our alignment computational latency and reducing overall audio alignment latency. There was a problem where the computing chroma matrix for the first audio sample had too high of a latency, around 1.5 seconds. We wanted our overall audio latency to be under 500 milliseconds. However, the computations that occur for the remaining audio samples averaged around 20 milliseconds. To solve this problem, we called the function before we do the first costly computation before the audio alignment starts to ensure we never experience this latency during the user’s performance.

After that, Caleb, Sanjana, and I worked trying to make the cursor update more smoothly and quicker based off solely the audio alignment. To do this we made sure our audio alignment picks the best matching subset of notes detected when comparing them to all notes in the MIDI file. This required us to rewrite some of our existing algorithm for audio alignment. I also worked on improving the eye-tracking heuristic model with Sanjana and Caleb. We essentially made the interpolation more robust by sending the duration of the user looking at the line of music they are currently playing to help the audio alignment better decide where the user is located.

So far, my progress is on schedule. Until the next 2 demos we have quite some work to finish up to make the system more aesthetically pleasing and robust.

 

Caleb’s Status Report for 12/9

This week I worked on improving the robustness of audio alignment. Unfortunately, dynamic time warping the whole reference audio to the live audio recorded by the player took too long. Therefore, a significant amount of time was spent on implementing a function we call MidiAlign. This function takes in the chroma vectors of the live audio and scans it for long durations of harmonic frequencies. This list of harmonic notes is then referenced against the reference MIDI file to find all instances of where the sequence of notes occur. To choose a instance in the reference MIDI to align to, the confidence of each possibility is weighted using the distance from where the user is currently playing as well as the number of missed notes in the sequence. Therefore, even if the user plays a wrong note, the function will not align to a drastically different section of the piece.

Another point of difficulty was dealing with latency from various points in the system. For example, librosa is a Python library that processes the audio into audio frames and also computes chroma vectors. However, this function on the first call runs caching in the background that causes the delay to rise from 20ms to 900ms. This caused our first audio alignment to lag the system and lead to undefined behavior. This was simply fixed by causing the first librosa call to occur during setup. Another point of latency was the constant call to update the webpage. This call to update the variables for the webpage was originally made every 20ms. However, this led to the system lagging. We upped this value to 50ms to give more time for the backend to process while still keeping the frontend cursor moving smoothly.

This upcoming week is the final demos. Therefore, we hope to create a demo where the whole system works along with several other modes that demonstrate the individual subsystems. Unfortunately, because eye-tracking and audio alignment are weighted together to determine the single page turn, it is hard to notice the individual contribution from each subsystem. We hope to have a mode where how eye tracking works is obvious and a mode where just audio alignment is used to turn the page. This will help the audience better understand how the system as a whole works.

Overall, we are mostly on track and will continue to work to create an enjoyable demo for the exhibition.

Team Status Report for 12/2

This week saw the most progress across our subsystems. On the frontend, we now have complete functionality that is just waiting for a few last integration touches from the backend. Right now, we are able to update the cursor and flip pages in real time with the audio subsystem and the eye tracking subsystem separately sending AJAX messages from the backend.

Some challenges we overcame this week involved the loss of Caleb’s laptop functionality. While debugging, we overwrote the user path name and restarted the computer, which made it confused and couldn’t load his drives. In order to recover from this issue, we moved our system to running on Rohan’s laptop. This involved several hours of installing dependencies, rebuilding environments, and fixing bugs. Finally, we got the real time system working on Rohan’s computer including audio and eye tracking.

Substantial improvements were made to the eye tracking subsystem this week. The eye tracking subsystem is able to accurately calculate when a user is nearing the end of a page and aptly flips the page as a function of the tempo. We also worked on adding a predictive element using the time difference of playing through a line. This uses linear extrapolation to predict where the user is located in the music more accurately than just the page flip logic.

The audio subsystem has become more accurate, with more advanced microphone filtering and clear chroma vector outputs from a real test audio. In addition, we ran into several problems with uploading and using MIDI files on Rohan’s computer, however, we fixed that and are able to receive sound outputs in MIDI form and decipher their information.

All in, it’s been a grind this week, but the finish line is in sight. We will continue working hard to reach our goals and deliver an amazing system.

Sanjana’s Status Report for 12/2

This week witnessed significant strides in the progress of each subsystem. The frontend has achieved full functionality, seamlessly interfacing with the backend to receive updates and dynamically present a real-time cursor through the utilization of Asynchronous JavaScript and XML (AJAX). Using AJAX marked a pivotal breakthrough, because alternatives like web sockets  proved incompatible with our codebase. Had AJAX not proven successful, I would have had to refactor the entire codebase to use React and state hooks for the real-time component.

We encountered a huge challenge this week when the PvRecorder library had a native sampling rate of 16000 Hz and we wanted to sample the audio at 48100 Hz. This resulted in over 8 hours of changing and debugging the source code files of PvRecorder. Without any luck, we resorted to other methods of fixing the bugs we saw, and happened to wipe Caleb’s laptop. With this setback, we had to dedicate over a day recovering code, files, reconstructing environments, and installing dependencies. Despite this challenge, we have worked many many hours and gained back the lost progress. The audio subsystem has less noisy chroma vectors without the fifth harmonic interference issues we dealt with last week. This was achieved by adjusting the microphone settings some more and not the sampling rate. To continue developing the subsystems without Caleb’s laptop, we have installed several programs for running the audio subsystem and the frontend completely from Rohan’s computer as Sanjana’s mac does not support the Tobii Eye Tracker. Although operational challenges come with the use of two distinct laptops, we concluded that maintaining two fully functional systems is the optimal approach following this week’s unfortunate incident.

I also assisted Rohan on the eye-tracking subsystem. We made several updates to the algorithm that allow for more functionality and perhaps could be a standalone subsystem with a little more work and testing. In addition to the override buttons from the previous week, we implemented line tracking to determine which line a user is looking at. From there, we have linearly extrapolated the predicted cursor speed using the last recorded speed of eye movement across a line. This information is being tested to see if the cursor can also be updated with just the eye tracking subsystem.

In summary, this week witnessed an exhaustive commitment of hours, driving the project towards its conclusion with the successful integration of diverse subsystems.

Rohan’s Status Report for 12/2

This week my team and I worked on the integration of our subsystems: Eye-Tracking, Audio Alignment, and Front-End. We also performed some quantitative tests for our system, while also working on the Final Presentation.

I personally worked on making the Eye-Tracking Heuristic more robust. For example, I worked on trying to add eye-gaze extrapolation to track the duration of the user’s eyes looking at each line to help with audio alignment. I also made all eye-tracking functions integrated with the Front-End. In others, when the user looks at the override buttons or the last couple of measures the page turns, very consistently. Additionally, I worked with Caleb and Sanjana on making the audio alignment work better and more robust by parsing through and filtering the chroma vectors of an audio stream. This took a lot of effort because of complex the signal processing is, and the multitude of problems and noise to deal with. Lastly, I helped Sanjana prepare for her presentation.

So far, my progress is on schedule, but for the remainder of the semester there is a lot of work left. Next week, I plan to the final steps of integration and more testing and validation for the last part of Capstone.

 

Caleb’s Status Report for 12/2

This week I integrated the audio alignment into the backend and tried numerous different methods to improve audio quality. The initial problem that we had was that instead of detecting the actual note being played, we were detecting the note that is a 5th above or 1.5x the frequency. We attempted to fix this by changing the pvrecorder API that recorded the audio to increase the sampling rate and achieve better audio quality. However, to do this, it required rebuilding the API and creating a new dll file. In trying to do this, we accidently changed some key computer settings on my laptop and wiped the drive that it was running off of. Therefore, we shifted the entire project to Rohan’s laptop and setup all the necessary environments all over again. After making this shift, instead of trying to change the pvrecorder API, we instead tested different locations for the microphone and found a location right near the bridge of the violin that greatly improved the audio quality. Although this location still gives the 5th occasionally, it does so far less frequently than other locations we tested. For instance, we tested right on the player’s shirt collar, however, any movement by the player caused the microphone to move and to pick up a static scraping sound as the microphone brushed against the fabric.

Another problem came in the implementation of the backend. It turns out, in Django, no multiprocessing in Python can be used. This was not known until the entire backend was parallelized. This parallelism was very important for hitting timing use-case requirements and may now only be doable in a subprocess call instead of directly in the Django backend. There is also another api called celery that may make this doable, but we are still unfamiliar with this api.

This upcoming week, I will be looking at the robustness of the MIDI align algorithm as well as how to implement it efficiently in the backend. To do this, we will continue to test with Sanjana playing and see if we can accurately place the cursor in roughly the correct position and if the other audio alignment algorithms such as DTW will improve the MIDI align algorithm.

We are currently still on track and will continue to work hard to stay on track.

Team Status Report for 11/18

This week, our team made significant progress on every subsystem. For eye tracking, we have a heuristic model that can predict when someone is looking at the last couple of measures on a page. Next steps involve head tracking heuristics for nonlinear or repetitive musical structures and edge cases that audio alignment alone would fail at. For the audio alignment, we now have audio alignment working with a few bugs. One serious challenge we faced was figuring out a reasonable threshold to detect long linear sections of matches between the MIDI and live recording. We’re still working on this, but now dynamic time warping is showing great results that we can use after applying thresholding and segmentation more intentionally. For the webpage, we have several new features: a user can turn pages manually, there is an animation for forward and backward page turns, and we also have the ability to change the page being displayed from the backend. Perhaps the most significant update to the frontend is that we can display a moving cursor to indicate position in the score. This feature is not yet being updated from the backend, however, and is the next big challenge. The goal is to implement web sockets to send data in real time with low latency and get the cursor to update. Overall, we made a lot of progress leading up to Thanksgiving break, and we will continue working hard to reach MVP. We are currently on schedule, but definitely have significant challenges to overcome in the weeks up ahead.

Over the past semester, our team has grown tremendously. We began with subsystems that we each wanted to work on – but over time, we began to function much more like a team with a goal. We help each other debug sections of code and we all work together to catch people up to speed who may have been out taking an exam. We care a lot about collaboration rather than competition, and ensure we always help each other out to reach MVP. Our scheduling philosophy is to use Agile Development and Sprint Planning in order to make progress every day. We meet 5-6 times a week and have a brief discussion at the beginning about blockers, goals, and areas we may need help in. After that, we get started with work!

Weekly team dinners are also extremely important to manage a good working relationship. Our team works in sprints, and we often have large blocks of scheduled time out of which we must complete tasks and remain productive. One unique thing we do is eat at Kiin Lao Thai Restaurant to boost morale. We’re able to get away from the code and discuss higher level concepts in integration and think about our system in a relaxed environment. When we get back to lab, we often figure out trickier bugs and resume productivity while keeping our minds nourished!

Sanjana’s Status Report for 11/18

This week, I fixed a nasty bug on the music display page. After this, progress resumed and I was able to display the pages of the PDF file as images and send which page to display from the backend. There are manual buttons to flip the page and a Django function that can take in user clicks to update the display with a small page turning animation. Page Flipping Video Demo.

Furthermore, I conducted research about Django channels and web sockets. Web sockets seem like a viable option, however, after attempting to install it, I ran into several errors. I’m retrying installation with my existing Django backend, but an alternative could be to setup the environment with web sockets and channels already enabled and ASGI instead of WSGI. Getting a live stream of data to the Django frontend

Rohan and I have the Frontend system running on our laptops, but for integration with the audio – which is currently on Caleb’s laptop – we helped get him setup with the environment needed to run the Frontend scripts. Over the course of 3 days, this took 2 + 2.5 + 2 = 6.5 hours, and I documented the process/steps in the README. Integration was a lot more difficult than expected, so we will be working overtime in order to integrate things earlier than expected.

This week, I also wanted to accomplish my goal of displaying a moving cursor and be able to update the frontend based on that. I calculated coordinates for each measure and line in order to have the cursor as accurate as possible. These values were hardcoded because our sheet music is standardized. Here’s a quick demo of the website: Moving Cursor Video Demo. I got the cursor to move at a constant pace on one line for now. This is a feature we wanted to develop for debugging, so it’s really great that I was able to display it when we had doubts about its feasibility earlier.

So far, my work is on track