sshriram – Team B6: SoundSync

December 2, 2023December 2, 2023

Team Status Report for 12/2

This week saw the most progress across our subsystems. On the frontend, we now have complete functionality that is just waiting for a few last integration touches from the backend. Right now, we are able to update the cursor and flip pages in real time with the audio subsystem and the eye tracking subsystem separately sending AJAX messages from the backend.

Some challenges we overcame this week involved the loss of Caleb’s laptop functionality. While debugging, we overwrote the user path name and restarted the computer, which made it confused and couldn’t load his drives. In order to recover from this issue, we moved our system to running on Rohan’s laptop. This involved several hours of installing dependencies, rebuilding environments, and fixing bugs. Finally, we got the real time system working on Rohan’s computer including audio and eye tracking.

Substantial improvements were made to the eye tracking subsystem this week. The eye tracking subsystem is able to accurately calculate when a user is nearing the end of a page and aptly flips the page as a function of the tempo. We also worked on adding a predictive element using the time difference of playing through a line. This uses linear extrapolation to predict where the user is located in the music more accurately than just the page flip logic.

The audio subsystem has become more accurate, with more advanced microphone filtering and clear chroma vector outputs from a real test audio. In addition, we ran into several problems with uploading and using MIDI files on Rohan’s computer, however, we fixed that and are able to receive sound outputs in MIDI form and decipher their information.

All in, it’s been a grind this week, but the finish line is in sight. We will continue working hard to reach our goals and deliver an amazing system.

December 2, 2023December 2, 2023

Sanjana’s Status Report for 12/2

This week witnessed significant strides in the progress of each subsystem. The frontend has achieved full functionality, seamlessly interfacing with the backend to receive updates and dynamically present a real-time cursor through the utilization of Asynchronous JavaScript and XML (AJAX). Using AJAX marked a pivotal breakthrough, because alternatives like web sockets proved incompatible with our codebase. Had AJAX not proven successful, I would have had to refactor the entire codebase to use React and state hooks for the real-time component.

We encountered a huge challenge this week when the PvRecorder library had a native sampling rate of 16000 Hz and we wanted to sample the audio at 48100 Hz. This resulted in over 8 hours of changing and debugging the source code files of PvRecorder. Without any luck, we resorted to other methods of fixing the bugs we saw, and happened to wipe Caleb’s laptop. With this setback, we had to dedicate over a day recovering code, files, reconstructing environments, and installing dependencies. Despite this challenge, we have worked many many hours and gained back the lost progress. The audio subsystem has less noisy chroma vectors without the fifth harmonic interference issues we dealt with last week. This was achieved by adjusting the microphone settings some more and not the sampling rate. To continue developing the subsystems without Caleb’s laptop, we have installed several programs for running the audio subsystem and the frontend completely from Rohan’s computer as Sanjana’s mac does not support the Tobii Eye Tracker. Although operational challenges come with the use of two distinct laptops, we concluded that maintaining two fully functional systems is the optimal approach following this week’s unfortunate incident.

I also assisted Rohan on the eye-tracking subsystem. We made several updates to the algorithm that allow for more functionality and perhaps could be a standalone subsystem with a little more work and testing. In addition to the override buttons from the previous week, we implemented line tracking to determine which line a user is looking at. From there, we have linearly extrapolated the predicted cursor speed using the last recorded speed of eye movement across a line. This information is being tested to see if the cursor can also be updated with just the eye tracking subsystem.

In summary, this week witnessed an exhaustive commitment of hours, driving the project towards its conclusion with the successful integration of diverse subsystems.

December 2, 2023December 4, 2023

Rohan’s Status Report for 12/2

This week my team and I worked on the integration of our subsystems: Eye-Tracking, Audio Alignment, and Front-End. We also performed some quantitative tests for our system, while also working on the Final Presentation.

I personally worked on making the Eye-Tracking Heuristic more robust. For example, I worked on trying to add eye-gaze extrapolation to track the duration of the user’s eyes looking at each line to help with audio alignment. I also made all eye-tracking functions integrated with the Front-End. In others, when the user looks at the override buttons or the last couple of measures the page turns, very consistently. Additionally, I worked with Caleb and Sanjana on making the audio alignment work better and more robust by parsing through and filtering the chroma vectors of an audio stream. This took a lot of effort because of complex the signal processing is, and the multitude of problems and noise to deal with. Lastly, I helped Sanjana prepare for her presentation.

So far, my progress is on schedule, but for the remainder of the semester there is a lot of work left. Next week, I plan to the final steps of integration and more testing and validation for the last part of Capstone.

December 2, 2023December 4, 2023

Caleb’s Status Report for 12/2

This week I integrated the audio alignment into the backend and tried numerous different methods to improve audio quality. The initial problem that we had was that instead of detecting the actual note being played, we were detecting the note that is a 5th above or 1.5x the frequency. We attempted to fix this by changing the pvrecorder API that recorded the audio to increase the sampling rate and achieve better audio quality. However, to do this, it required rebuilding the API and creating a new dll file. In trying to do this, we accidently changed some key computer settings on my laptop and wiped the drive that it was running off of. Therefore, we shifted the entire project to Rohan’s laptop and setup all the necessary environments all over again. After making this shift, instead of trying to change the pvrecorder API, we instead tested different locations for the microphone and found a location right near the bridge of the violin that greatly improved the audio quality. Although this location still gives the 5th occasionally, it does so far less frequently than other locations we tested. For instance, we tested right on the player’s shirt collar, however, any movement by the player caused the microphone to move and to pick up a static scraping sound as the microphone brushed against the fabric.

Another problem came in the implementation of the backend. It turns out, in Django, no multiprocessing in Python can be used. This was not known until the entire backend was parallelized. This parallelism was very important for hitting timing use-case requirements and may now only be doable in a subprocess call instead of directly in the Django backend. There is also another api called celery that may make this doable, but we are still unfamiliar with this api.

This upcoming week, I will be looking at the robustness of the MIDI align algorithm as well as how to implement it efficiently in the backend. To do this, we will continue to test with Sanjana playing and see if we can accurately place the cursor in roughly the correct position and if the other audio alignment algorithms such as DTW will improve the MIDI align algorithm.

We are currently still on track and will continue to work hard to stay on track.

November 15, 2023November 19, 2023

Team Status Report for 11/18

This week, our team made significant progress on every subsystem. For eye tracking, we have a heuristic model that can predict when someone is looking at the last couple of measures on a page. Next steps involve head tracking heuristics for nonlinear or repetitive musical structures and edge cases that audio alignment alone would fail at. For the audio alignment, we now have audio alignment working with a few bugs. One serious challenge we faced was figuring out a reasonable threshold to detect long linear sections of matches between the MIDI and live recording. We’re still working on this, but now dynamic time warping is showing great results that we can use after applying thresholding and segmentation more intentionally. For the webpage, we have several new features: a user can turn pages manually, there is an animation for forward and backward page turns, and we also have the ability to change the page being displayed from the backend. Perhaps the most significant update to the frontend is that we can display a moving cursor to indicate position in the score. This feature is not yet being updated from the backend, however, and is the next big challenge. The goal is to implement web sockets to send data in real time with low latency and get the cursor to update. Overall, we made a lot of progress leading up to Thanksgiving break, and we will continue working hard to reach MVP. We are currently on schedule, but definitely have significant challenges to overcome in the weeks up ahead.

Over the past semester, our team has grown tremendously. We began with subsystems that we each wanted to work on – but over time, we began to function much more like a team with a goal. We help each other debug sections of code and we all work together to catch people up to speed who may have been out taking an exam. We care a lot about collaboration rather than competition, and ensure we always help each other out to reach MVP. Our scheduling philosophy is to use Agile Development and Sprint Planning in order to make progress every day. We meet 5-6 times a week and have a brief discussion at the beginning about blockers, goals, and areas we may need help in. After that, we get started with work!

Weekly team dinners are also extremely important to manage a good working relationship. Our team works in sprints, and we often have large blocks of scheduled time out of which we must complete tasks and remain productive. One unique thing we do is eat at Kiin Lao Thai Restaurant to boost morale. We’re able to get away from the code and discuss higher level concepts in integration and think about our system in a relaxed environment. When we get back to lab, we often figure out trickier bugs and resume productivity while keeping our minds nourished!

November 15, 2023November 18, 2023

Sanjana’s Status Report for 11/18

This week, I fixed a nasty bug on the music display page. After this, progress resumed and I was able to display the pages of the PDF file as images and send which page to display from the backend. There are manual buttons to flip the page and a Django function that can take in user clicks to update the display with a small page turning animation. Page Flipping Video Demo.

Furthermore, I conducted research about Django channels and web sockets. Web sockets seem like a viable option, however, after attempting to install it, I ran into several errors. I’m retrying installation with my existing Django backend, but an alternative could be to setup the environment with web sockets and channels already enabled and ASGI instead of WSGI. Getting a live stream of data to the Django frontend

Rohan and I have the Frontend system running on our laptops, but for integration with the audio – which is currently on Caleb’s laptop – we helped get him setup with the environment needed to run the Frontend scripts. Over the course of 3 days, this took 2 + 2.5 + 2 = 6.5 hours, and I documented the process/steps in the README. Integration was a lot more difficult than expected, so we will be working overtime in order to integrate things earlier than expected.

This week, I also wanted to accomplish my goal of displaying a moving cursor and be able to update the frontend based on that. I calculated coordinates for each measure and line in order to have the cursor as accurate as possible. These values were hardcoded because our sheet music is standardized. Here’s a quick demo of the website: Moving Cursor Video Demo. I got the cursor to move at a constant pace on one line for now. This is a feature we wanted to develop for debugging, so it’s really great that I was able to display it when we had doubts about its feasibility earlier.

So far, my work is on track

November 15, 2023November 19, 2023

Rohan’s Status Report for 11/18

This week my team focused again on integration of the three systems: Front-End, Eye-Tracking, and Audio Alignment. The main issue right now is tying everything to the Front-end, i.e. how the other two systems communicate with the Front-end and how the Front-End communicates back to those systems.

As a result, this week I mainly worked on debugging front-end issues and helping set up the display of the uploaded sheet music display. I worked with Sanjana on this feature. We both decided that it would best to convert the sheet music pdf to a set of images, and then display the images one at a time as pages. There were a lot of issues when trying to do this. First, the pdf would not display because it was not being processed properly. When we fixed that issue, converting the pdf to images had its own issues. The image paths would always be incorrectly saved, so they could never be accessed. The div containing the images also caused problems. However, we were able to debug and fix this issue. We were able to get the images display one at a time.

Later in the week, I helped Caleb install Django and set up our front-end application locally on his laptop. It took a couple of days but I helped him get it done. Lastly, towards the end of the week I was researching how to use web sockets to update the page display when the backend sends the signal to turn the page. I was following this tutorial: https://earthly.dev/blog/build-real-time-comm-app/.

So far, my progress is on schedule, but for the remainder of the semester there is a lot of work left. Next week, I plan to implement web-sockets for the backend to update the front-end. I also look into how a web-socket can be used between the our Eye-tracking script and the front-end.

November 15, 2023November 18, 2023

Caleb’s Status Report for 11/18

This week I finalizing the audio alignment algorithm and integrating it into the system. The audio alignment algorithm uses an API called synctoolbox which is an API created by groups of researchers working on audio alignment and audio processing. Using the alignment algorithm returns matrix mapping points from a reference audio to a live audio segment. However, finding the starting point using the warping matrix is still non-trivial due to noise in the linearity. Linearity occurs when the live audio and the reference audio are in sync and, therefore, progress at the same rate. The warping matrix may find linearity occurring before the true starting point or the true starting point may have non-linearity shortly after that makes it hard to find. Overall, this means that a threshold to determine which linearity is the true starting point is needed while not perceiving the noise as the true starting position. For the warping matrix below, note how the linearity occurs at ~200 frames in the audio segment which is due to a pause at the beginning of the recording. Also note that it aligns with ~900 frames within the reference audio, which can be turned into an exact time within the recording.

This upcoming week, I will be looking at the robustness of both the warping alogirthm as well as the algorithm to find the true starting position. To do this, we have recorded several segments of audio where Sanjana performed with wrong notes, skipped bars, and arbitrary tempo changes. Several of these audio segments can be found below. After testing the robustness, I’ll be looking at how to package that starting time and turn it into coordinates on a page to place the cursor.

We are currently still on track and will continue to work hard to stay on track.

Reference Audio: Reference_Audio

Audio Segment (Recorded by Sanjana): Audio_Segment

Warping Matrix: Warping_Matrix

November 10, 2023November 12, 2023

Team Status Report for 11/11

This week we focused on getting each of the subsystems to a point of moderate success. We are focusing on each of the subsystems independently as we want to budget enough time to integrate all the subsystems together. The main difficulty is that the audio is running off of python, the eye tracker is running off of C++ and the front end is using Django. Therefore, to have all the subsystems communicate with one another, web sockets will have to be used. Importantly, our web sockets have to function on all three subsystems. This is a major risk as the subsystems being unable to communicate with one another would cause the project to not function and could result in latency issues. The backup plan would consist of using local files as points where information is read and written to. This method is slow, however, acts as a safety net for the web sockets.

One major design decision made was to no longer use the google board. This change was made as we decided that a heuristic for the eye tracker is simpler to implement and would most likely produce similar results. Therefore, the whole project can now run off of a single laptop. The only cost to this change is that we can no longer harness the TPU on the google board and that the system is now slightly larger. Using a computer requires a larger surface area for the device to sit on top of. Our design use case was originally intended for practice settings, which is still viable despite the size of the laptop. The design is not as heavily focused on performance settings.

November 10, 2023November 12, 2023

Sanjana’s Status Report for 11/11

This week, I added functionality to the frontend. As of now, the PDF file upload is working and the MIDI file upload is a work in progress. I also spent considerable time thinking and researching integration with the other subsystems. For the audio alignment, the actual stream could be sent to the backend. Then the backend will process the audio into data using algorithms that are already being developed in Python. An alternative is to do all the processing on the laptop that the Eye Tracking Camera is connected to, then use HTTP or web socket protocol to send only data to the backend. I’m still weighing the pros and cons of these design decisions and will continue to discuss these with more advanced full stack engineers before implementation.

Additionally, I continued to implement the frontend. I added functionality to help make the forms store data in a better way by refining the models and worked on some smaller details in the web experience. One piece of feedback we received from the interim demos was regarding the MVP of the frontend. The frontend system will be hosted locally on the laptop to which the Tobii Eye Tracker Camera is connected. As of now, my plan is to integrate the audio scripts into the backend of the django app or establish some connection between those python files and the backend. The upload page and parsing the MIDI file are two aspects I’ve been working on over the past couple of days, however these features don’t work bug-free yet.

My progress is on schedule, but I am working hard to think more about integration and am beginning to focus my efforts there. Understand the eye tracking and audio outputs and formats will help me get to a solution regarding integration. For next week, I want to be able to add some sort of integration or at least simulate it with a fake stream of live data. I would also like to establish a communication channel with one of the subsystems.

Regarding our verification and validation, we are planning to run latency tests and accuracy tests for page flipping as outlined in our design report. An important aspect of the frontend and integration is that the data streams are processed and sent to the frontend and displayed within a beat. We’ll analyze these results by timing the inputs and outputs and precisely measuring latency. This will ensure that the project meets engineering design requirements.