Uncategorized – Page 2 – Team B6: SoundSync

November 15, 2023November 19, 2023

Rohan’s Status Report for 11/18

This week my team focused again on integration of the three systems: Front-End, Eye-Tracking, and Audio Alignment. The main issue right now is tying everything to the Front-end, i.e. how the other two systems communicate with the Front-end and how the Front-End communicates back to those systems.

As a result, this week I mainly worked on debugging front-end issues and helping set up the display of the uploaded sheet music display. I worked with Sanjana on this feature. We both decided that it would best to convert the sheet music pdf to a set of images, and then display the images one at a time as pages. There were a lot of issues when trying to do this. First, the pdf would not display because it was not being processed properly. When we fixed that issue, converting the pdf to images had its own issues. The image paths would always be incorrectly saved, so they could never be accessed. The div containing the images also caused problems. However, we were able to debug and fix this issue. We were able to get the images display one at a time.

Later in the week, I helped Caleb install Django and set up our front-end application locally on his laptop. It took a couple of days but I helped him get it done. Lastly, towards the end of the week I was researching how to use web sockets to update the page display when the backend sends the signal to turn the page. I was following this tutorial: https://earthly.dev/blog/build-real-time-comm-app/.

So far, my progress is on schedule, but for the remainder of the semester there is a lot of work left. Next week, I plan to implement web-sockets for the backend to update the front-end. I also look into how a web-socket can be used between the our Eye-tracking script and the front-end.

November 15, 2023November 18, 2023

Caleb’s Status Report for 11/18

This week I finalizing the audio alignment algorithm and integrating it into the system. The audio alignment algorithm uses an API called synctoolbox which is an API created by groups of researchers working on audio alignment and audio processing. Using the alignment algorithm returns matrix mapping points from a reference audio to a live audio segment. However, finding the starting point using the warping matrix is still non-trivial due to noise in the linearity. Linearity occurs when the live audio and the reference audio are in sync and, therefore, progress at the same rate. The warping matrix may find linearity occurring before the true starting point or the true starting point may have non-linearity shortly after that makes it hard to find. Overall, this means that a threshold to determine which linearity is the true starting point is needed while not perceiving the noise as the true starting position. For the warping matrix below, note how the linearity occurs at ~200 frames in the audio segment which is due to a pause at the beginning of the recording. Also note that it aligns with ~900 frames within the reference audio, which can be turned into an exact time within the recording.

This upcoming week, I will be looking at the robustness of both the warping alogirthm as well as the algorithm to find the true starting position. To do this, we have recorded several segments of audio where Sanjana performed with wrong notes, skipped bars, and arbitrary tempo changes. Several of these audio segments can be found below. After testing the robustness, I’ll be looking at how to package that starting time and turn it into coordinates on a page to place the cursor.

We are currently still on track and will continue to work hard to stay on track.

Reference Audio: Reference_Audio

Audio Segment (Recorded by Sanjana): Audio_Segment

Warping Matrix: Warping_Matrix

November 10, 2023November 12, 2023

Team Status Report for 11/11

This week we focused on getting each of the subsystems to a point of moderate success. We are focusing on each of the subsystems independently as we want to budget enough time to integrate all the subsystems together. The main difficulty is that the audio is running off of python, the eye tracker is running off of C++ and the front end is using Django. Therefore, to have all the subsystems communicate with one another, web sockets will have to be used. Importantly, our web sockets have to function on all three subsystems. This is a major risk as the subsystems being unable to communicate with one another would cause the project to not function and could result in latency issues. The backup plan would consist of using local files as points where information is read and written to. This method is slow, however, acts as a safety net for the web sockets.

One major design decision made was to no longer use the google board. This change was made as we decided that a heuristic for the eye tracker is simpler to implement and would most likely produce similar results. Therefore, the whole project can now run off of a single laptop. The only cost to this change is that we can no longer harness the TPU on the google board and that the system is now slightly larger. Using a computer requires a larger surface area for the device to sit on top of. Our design use case was originally intended for practice settings, which is still viable despite the size of the laptop. The design is not as heavily focused on performance settings.

November 10, 2023November 12, 2023

Sanjana’s Status Report for 11/11

This week, I added functionality to the frontend. As of now, the PDF file upload is working and the MIDI file upload is a work in progress. I also spent considerable time thinking and researching integration with the other subsystems. For the audio alignment, the actual stream could be sent to the backend. Then the backend will process the audio into data using algorithms that are already being developed in Python. An alternative is to do all the processing on the laptop that the Eye Tracking Camera is connected to, then use HTTP or web socket protocol to send only data to the backend. I’m still weighing the pros and cons of these design decisions and will continue to discuss these with more advanced full stack engineers before implementation.

Additionally, I continued to implement the frontend. I added functionality to help make the forms store data in a better way by refining the models and worked on some smaller details in the web experience. One piece of feedback we received from the interim demos was regarding the MVP of the frontend. The frontend system will be hosted locally on the laptop to which the Tobii Eye Tracker Camera is connected. As of now, my plan is to integrate the audio scripts into the backend of the django app or establish some connection between those python files and the backend. The upload page and parsing the MIDI file are two aspects I’ve been working on over the past couple of days, however these features don’t work bug-free yet.

My progress is on schedule, but I am working hard to think more about integration and am beginning to focus my efforts there. Understand the eye tracking and audio outputs and formats will help me get to a solution regarding integration. For next week, I want to be able to add some sort of integration or at least simulate it with a fake stream of live data. I would also like to establish a communication channel with one of the subsystems.

Regarding our verification and validation, we are planning to run latency tests and accuracy tests for page flipping as outlined in our design report. An important aspect of the frontend and integration is that the data streams are processed and sent to the frontend and displayed within a beat. We’ll analyze these results by timing the inputs and outputs and precisely measuring latency. This will ensure that the project meets engineering design requirements.

November 10, 2023November 12, 2023

Rohan’s Status Report for 11/11

This week my team and I worked on our interim demo. Our demo consisted of showcasing our eye-tracking, audio signal capturing and slight-preprocessing, and a test version of the front-end.

After our day 1 demo, I worked on trying to add thresholding to the eye-tracking system, while also helping further implement the front-end. In terms of thresholding, I essentially wrote a script that outputs a stream of 1s or 0s if the user is looking at the last two bars of sheet music. The 1, here represents, a positive page turn signal, and a 0 represents a negative page turn signal. Essentially, if the user is staring at the last 2 bars for a specific threshold time, then it’s time to turn the page from the Eye-tracking side. The threshold time I chose was based off the tempo. The script I wrote ended up working. Now, I need to figure out how to send this data signal to the front-end.

Additionally, I did some front-end work this week. I helped debug and fix a .css style issue for our front-end. Essentially, the .css file was not properly uploading to the HTML, but Sanjana and I fixed this. I also helped Sanjana work on the Music/Midi upload page through django. This is the page where the user needs to upload their Midi file and a pdf of their sheet music. We haven’t finished this HTML page, but we got most of the functionality of it working. Just need to format it and add style. The biggest challenge right now is integrating the three systems together.

In terms of eye-tracking data analysis, I’ve been doing some tinkering. The eye-gaze stream script I wrote prints the user’s eye gaze between 0 and 1. This is because I was using the Tobii SDK where the x,y coordinates of the user’s eyes are represented as a vector between 0 and 1. I plan to scale this data by 1000 to have more precise measurements and data. One important design case requirement is the eye-tracking latency. The latency must be within 1 beat of a given music’s tempo. I need to make sure the data stream to the front-end must fit this latency. I will have to look into faster sampling rates and refresh rates.

So far I’ve made decent progress, and I am on schedule currently. For next week, I plan to look into web-sockets to try to have all three systems able to communicate with each other. In other words, I need to make sure that the eye-tracking system is able to send data to the Front-end and vice versa.

November 10, 2023November 12, 2023

Caleb’s Status Report for 11/11

This week I spent time transitioning the audio component from running off the google board to running off of a Dell XPS 15 laptop. This decision was made as we decided to move away from creating a ML model from scratch and instead use a heuristic for the eye-tracking data. Transitioning to a laptop had some unforeseen difficulties as the microphone was not producing any audio and code written on the google board was not crashing on the laptop. After debugging, the code now runs identically on the laptop as it did on the google board. However, this time spent debugging hindered progress on improving the audio processing. Some progress has been made in segmenting the audio into chunks to later take the fourier transform and compare to the MIDI. However, the robustness of this process is still unknown. We forsee that few errors in note prediction can be handled, but a large number of errors will lead to the whole system being unreliable. This is caused because the method we use to detect sound is not a transducer placed directly on the instrument but rather a microphone which is susceptible to picking up noise, reflections, and harmonics.

This upcoming week, I will be looking to start taking the chunks from the microphone and running them through a Fast Fourier Transform (FFT) and seeing how accurately the program can predict the notes being played. We have already run a C major scale through the FFT and only found the highest C was not being predicted correctly. We believe this one error is manageable but have not tested it under many other conditions.

In terms of testing, the C major scale will continue to act as the baseline mark for testing the quality of notes extracted from a signal. Currently, the Fourier transform is predicting the upper harmonics to be large in amplitude for certain notes, causing some errors in matching to the MIDI. The MIDI acts as ground truth and will be used to analyze how notes are extracted. I will note that this assumes the musician makes no mistakes, which is possible for simple sequences we will be testing on. Another test for the audio portion is to test that a signal is always sent to the front end when the music reaches the end of the page. This will be tested by having a musician play into the microphone and monitoring the front end if the page turns solely based on the audio.

We are currently still on track and will continue to work hard to stay on track.

November 4, 2023November 6, 2023

Team Status Report for 11/4

View Updated Gantt Chart

Our team made progress on the eye tracking subsystem this week. We managed to set up the Tobii Eye Tracker 5, calibrate it accurately with our eyes, and gather data points from the data stream. There was also significant progress on audio. Currently, we are working through the challenging tradeoff of audio quality with storage.

This week, we focused on preparing working subsystems for the interim demo. The most significant risk after that will be integrating different subsystems. We are managing these risks by carefully considering what software environments we build in. For example, having a virtual environment for the frontend code and ensuring that those libraries and installations don’t conflict with known potential conflicts present in Tobii eye tracker software or our audio libraries.

View Eye-Tracker Demo

No major design changes were made to the system – we are on track and proceeding with the components that our design report detailed.

November 4, 2023November 5, 2023

Sanjana’s Status Report for 11/4

This week, I worked with Caleb on setting up and testing some parts of the audio/microphone, attended sessions with Dr. Dueck and her students, and added some non integration related functionality to the frontend.

For the frontend, I had to overcome some challenges regarding setting up a virtual environment and testing that the app displays the same for different laptops and operating systems across my team. Standardization for the display/frontend is important because sheet music is being standardized already. In order to ensure that the size of the music is the same for everyone across different laptops and monitors, the physical distance and pixels on the frontend need to be the same.

Regarding microphone testing, we tested whether or not the Google Board mic was picking up extraneous noise and ensured in the code that the lapel microphone audio was being utilized. I setup Github repositories for the audio and eye tracking code as well as continued developing the frontend which was already on Github.

Dr. Dueck’s sessions have time and time proved to be invaluable sources of information. From this week’s sessions, I got to further explore ideas related to note onset, breath control, and the directionality of sound. I was able to iteratively discover more optimal microphone placement and collected several minutes of usable audio with which to test our algorithms, segmentation, chroma vectors, and note onset detection.

My progress is now on track and the frontend is shaping up well. For next week, I hope to be able to begin integrating some components of either subsystem into the frontend in realtime and work through debugging that. I believe integration is the most challenging task, so I will have to begin attempting some major aspects of integration, possibly before individual components are working completely.

November 4, 2023November 6, 2023

Rohan’s Status Report for 11/4

This week my team and I worked on our interim demo. Our demo consists of showcasing our eye-tracking, audio signal capturing and slight-preprocessing, and a test version of the front-end.

I worked on setting up the tobii eye-tracking camera and writing and testing a simple eye-tracking script. Setting up the camera required: installing the Tobii Experience application, setting up the display, and setting up calibration. After executing these steps, the application allowed a pre-gaze feature, which is a spotlight circle that moves around the screen according to where the user is looking at the screen. The test script displays the user’s current eye-gaze position on a windows terminal. This can be clearly seen in this video we shot.

Here, the coordinate system of the screen is (0,0) for top left corner, (1,0) for top right corner, (1,1) for bottom right corner, (0,1) is the bottom left corner.

So far I’ve made decent progress in terms of the eye-tracking, and I am on schedule currently. For next week, I plan to write more scripts to integrate the front-end with the eye-tracking, such as looking at the bottom right section of the homepage redirects to a different html view. I will also look into finding a way to continuously send eye-gaze position to the google board in real time.

November 4, 2023November 4, 2023

Caleb’s Status Report for 11/4

This week I spent time working with the google board to have it take in input from the microphone and parse it into chunks that can be processed. This proved to be a bit more difficult because the google board OS did not seem to recognize the lapel microphone as an input. After working with it in the terminal, the microphone is now an unnamed input object that acts as the default microphone. On top of this, because the google board does not have a speaker, I spent time making a way to play the audio files from the google board. This process involves having the file pushed to a git repository and downloaded on a machine with a speaker (i.e my computer). I also spent time trying to adjust the settings for the recording stream to optimize the quality of audio we can get as well as testing different recording conditions to see what kind of noise is also picked up by the microphone. The last thing to do with the microphone is to see what format gives the easiest to work with audio files. For example, a 1MB audio file, although is high quality, will take longer to process and may cause us to not meet the latency requirement.

This upcoming week, I will be looking at different libraries to see which runs faster on the google board. The current library we are using is librosa but might turn out to run slower than PyTorch. In the worst case scenario, we are able to code up the dynamic time warping algorithm from first principles, but it is unlikely to run faster than the prewritten ones.

We are currently still on track and will continue to work hard to stay on track.