Team Status Report for 11/18

This week, our team made significant progress on every subsystem. For eye tracking, we have a heuristic model that can predict when someone is looking at the last couple of measures on a page. Next steps involve head tracking heuristics for nonlinear or repetitive musical structures and edge cases that audio alignment alone would fail at. For the audio alignment, we now have audio alignment working with a few bugs. One serious challenge we faced was figuring out a reasonable threshold to detect long linear sections of matches between the MIDI and live recording. We’re still working on this, but now dynamic time warping is showing great results that we can use after applying thresholding and segmentation more intentionally. For the webpage, we have several new features: a user can turn pages manually, there is an animation for forward and backward page turns, and we also have the ability to change the page being displayed from the backend. Perhaps the most significant update to the frontend is that we can display a moving cursor to indicate position in the score. This feature is not yet being updated from the backend, however, and is the next big challenge. The goal is to implement web sockets to send data in real time with low latency and get the cursor to update. Overall, we made a lot of progress leading up to Thanksgiving break, and we will continue working hard to reach MVP. We are currently on schedule, but definitely have significant challenges to overcome in the weeks up ahead.

Over the past semester, our team has grown tremendously. We began with subsystems that we each wanted to work on – but over time, we began to function much more like a team with a goal. We help each other debug sections of code and we all work together to catch people up to speed who may have been out taking an exam. We care a lot about collaboration rather than competition, and ensure we always help each other out to reach MVP. Our scheduling philosophy is to use Agile Development and Sprint Planning in order to make progress every day. We meet 5-6 times a week and have a brief discussion at the beginning about blockers, goals, and areas we may need help in. After that, we get started with work!

Weekly team dinners are also extremely important to manage a good working relationship. Our team works in sprints, and we often have large blocks of scheduled time out of which we must complete tasks and remain productive. One unique thing we do is eat at Kiin Lao Thai Restaurant to boost morale. We’re able to get away from the code and discuss higher level concepts in integration and think about our system in a relaxed environment. When we get back to lab, we often figure out trickier bugs and resume productivity while keeping our minds nourished!

Sanjana’s Status Report for 11/18

This week, I fixed a nasty bug on the music display page. After this, progress resumed and I was able to display the pages of the PDF file as images and send which page to display from the backend. There are manual buttons to flip the page and a Django function that can take in user clicks to update the display with a small page turning animation. Page Flipping Video Demo.

Furthermore, I conducted research about Django channels and web sockets. Web sockets seem like a viable option, however, after attempting to install it, I ran into several errors. I’m retrying installation with my existing Django backend, but an alternative could be to setup the environment with web sockets and channels already enabled and ASGI instead of WSGI. Getting a live stream of data to the Django frontend

Rohan and I have the Frontend system running on our laptops, but for integration with the audio – which is currently on Caleb’s laptop – we helped get him setup with the environment needed to run the Frontend scripts. Over the course of 3 days, this took 2 + 2.5 + 2 = 6.5 hours, and I documented the process/steps in the README. Integration was a lot more difficult than expected, so we will be working overtime in order to integrate things earlier than expected.

This week, I also wanted to accomplish my goal of displaying a moving cursor and be able to update the frontend based on that. I calculated coordinates for each measure and line in order to have the cursor as accurate as possible. These values were hardcoded because our sheet music is standardized. Here’s a quick demo of the website: Moving Cursor Video Demo. I got the cursor to move at a constant pace on one line for now. This is a feature we wanted to develop for debugging, so it’s really great that I was able to display it when we had doubts about its feasibility earlier.

So far, my work is on track

Rohan’s Status Report for 11/18

This week my team focused again on integration of the three systems: Front-End, Eye-Tracking, and Audio Alignment. The main issue right now is tying everything to the Front-end, i.e. how the other two systems communicate with the Front-end and how the Front-End communicates back to those systems.

As a result, this week I mainly worked on debugging front-end issues and helping set up the display of the uploaded sheet music display. I worked with Sanjana on this feature. We both decided that it would best to convert the sheet music pdf to a set of images, and then display the images one at a time as pages. There were a lot of issues when trying to do this. First, the pdf would not display because it was not being processed properly. When we fixed that issue, converting the pdf to images had its own issues. The image paths would always be incorrectly saved, so they could never be accessed. The div containing the images also caused problems. However, we were able to debug and fix this issue. We were able to get the images display one at a time.

Later in the week, I helped Caleb install Django and set up our front-end application locally on his laptop. It took a couple of days but I helped him get it done. Lastly, towards the end of the week I was researching how to use web sockets to update the page display when the backend sends the signal to turn the page. I was following this tutorial: https://earthly.dev/blog/build-real-time-comm-app/.

So far, my progress is on schedule, but for the remainder of the semester there is a lot of work left. Next week, I plan to implement web-sockets for the backend to update the front-end. I also look into how a web-socket can be used between the our Eye-tracking script and the front-end.

Caleb’s Status Report for 11/18

This week I finalizing the audio alignment algorithm and integrating it into the system. The audio alignment algorithm uses an API called synctoolbox which is an API created by groups of researchers working on audio alignment and audio processing. Using the alignment algorithm returns matrix mapping points from a reference audio to a live audio segment. However, finding the starting point using the warping matrix is still non-trivial due to noise in the linearity. Linearity occurs when the live audio and the reference audio are in sync and, therefore, progress at the same rate. The warping matrix may find linearity occurring before the true starting point or the true starting point may have non-linearity shortly after that makes it hard to find. Overall, this means that a threshold to determine which linearity is the true starting point is needed while not perceiving the noise as the true starting position. For the warping matrix below, note how the linearity occurs at ~200 frames in the audio segment which is due to a pause at the beginning of the recording. Also note that it aligns with ~900 frames within the reference audio, which can be turned into an exact time within the recording.

This upcoming week, I will be looking at the robustness of both the warping alogirthm as well as the algorithm to find the true starting position. To do this, we have recorded several segments of audio where Sanjana performed with wrong notes, skipped bars, and arbitrary tempo changes. Several of these audio segments can be found below. After testing the robustness, I’ll be looking at how to package that starting time and turn it into coordinates on a page to place the cursor.

We are currently still on track and will continue to work hard to stay on track.

Reference Audio: Reference_Audio

Audio Segment (Recorded by Sanjana): Audio_Segment

Warping Matrix: Warping_Matrix

Team Status Report for 11/11

This week we focused on getting each of the subsystems to a point of moderate success. We are focusing on each of the subsystems independently as we want to budget enough time to integrate all the subsystems together. The main difficulty is that the audio is running off of python, the eye tracker is running off of C++ and the front end is using Django. Therefore, to have all the subsystems communicate with one another, web sockets will have to be used. Importantly, our web sockets have to function on all three subsystems. This is a major risk as the subsystems being unable to communicate with one another would cause the project to not function and could result in latency issues. The backup plan would consist of using local files as points where information is read and written to. This method is slow, however, acts as a safety net for the web sockets.

One major design decision made was to no longer use the google board. This change was made as we decided that a heuristic for the eye tracker is simpler to implement and would most likely produce similar results. Therefore, the whole project can now run off of a single laptop. The only cost to this change is that we can no longer harness the TPU on the google board and that the system is now slightly larger. Using a computer requires a larger surface area for the device to sit on top of. Our design use case was originally intended for practice settings, which is still viable despite the size of the laptop. The design is not as heavily focused on performance settings.

Sanjana’s Status Report for 11/11

This week, I added functionality to the frontend. As of now, the PDF file upload is working and the MIDI file upload is a work in progress. I also spent considerable time thinking and researching integration with the other subsystems. For the audio alignment, the actual stream could be sent to the backend. Then the backend will process the audio into data using algorithms that are already being developed in Python. An alternative is to do all the processing on the laptop that the Eye Tracking Camera is connected to, then use HTTP or web socket protocol to send only data to the backend. I’m still weighing the pros and cons of these design decisions and will continue to discuss these with more advanced full stack engineers before implementation.

Additionally, I continued to implement the frontend. I added functionality to help make the forms store data in a better way by refining the models and worked on some smaller details in the web experience. One piece of feedback we received from the interim demos was regarding the MVP of the frontend. The frontend system will be hosted locally on the laptop to which the Tobii Eye Tracker Camera is connected. As of now, my plan is to integrate the audio scripts into the backend of the django app or establish some connection between those python files and the backend. The upload page and parsing the MIDI file are two aspects I’ve been working on over the past couple of days, however these features don’t work bug-free yet.

My progress is on schedule, but I am working hard to think more about integration and am beginning to focus my efforts there. Understand the eye tracking and audio outputs and formats will help me get to a solution regarding integration. For next week, I want to be able to add some sort of integration or at least simulate it with a fake stream of live data. I would also like to establish a communication channel with one of the subsystems.

Regarding our verification and validation, we are planning to run latency tests and accuracy tests for page flipping as outlined in our design report. An important aspect of the frontend and integration is that the data streams are processed and sent to the frontend and displayed within a beat. We’ll analyze these results by timing the inputs and outputs and precisely measuring latency. This will ensure that the project meets engineering design requirements.

Rohan’s Status Report for 11/11

This week my team and I worked on our interim demo. Our demo consisted of showcasing our eye-tracking, audio signal capturing and slight-preprocessing, and a test version of the front-end.

After our day 1 demo, I worked on trying to add thresholding to the eye-tracking system, while also helping further implement the front-end. In terms of thresholding, I essentially wrote a script that outputs a stream of 1s or 0s if the user is looking at the last two bars of sheet music. The 1, here represents, a positive page turn signal, and a 0 represents a negative page turn signal. Essentially, if the user is staring at the last 2 bars for a specific threshold time, then it’s time to turn the page from the Eye-tracking side. The threshold time I chose was based off the tempo. The script I wrote ended up working. Now, I need to figure out how to send this data signal to the front-end.

Additionally, I did some front-end work this week. I helped debug and fix a .css style issue for our front-end. Essentially, the .css file was not properly uploading to the HTML, but Sanjana and I fixed this. I also helped Sanjana work on the Music/Midi upload page through django. This is the page where the user needs to upload their Midi file and a pdf of their sheet music. We haven’t finished this HTML page, but we got most of the functionality of it working. Just need to format it and add style. The biggest challenge right now is integrating the three systems together.

In terms of eye-tracking data analysis, I’ve been doing some tinkering. The eye-gaze stream script I wrote prints the user’s eye gaze between 0 and 1. This is because I was using the Tobii SDK where the x,y coordinates of the user’s eyes are represented as a vector between 0 and 1.  I plan to scale this data by 1000 to have more precise measurements and data. One important design case requirement is the eye-tracking latency. The latency must be within 1 beat of a given music’s tempo. I need to make sure the data stream to the front-end must fit this latency. I will have to look into faster sampling rates and refresh rates.

So far I’ve made decent progress, and I am on schedule currently. For next week, I plan to look into web-sockets to try to have all three systems able to communicate with each other. In other words, I need to make sure that the eye-tracking system is able to send data to the Front-end and vice versa.

 

Caleb’s Status Report for 11/11

This week I spent time transitioning the audio component from running off the google board to running off of a Dell XPS 15 laptop. This decision was made as we decided to move away from creating a ML model from scratch and instead use a heuristic for the eye-tracking data. Transitioning to a laptop had some unforeseen difficulties as the microphone was not producing any audio and code written on the google board was not crashing on the laptop. After debugging, the code now runs identically on the laptop as it did on the google board. However, this time spent debugging hindered progress on improving the audio processing. Some progress has been made in segmenting the audio into chunks to later take the fourier transform and compare to the MIDI. However, the robustness of this process is still unknown. We forsee that few errors in note prediction can be handled, but a large number of errors will lead to the whole system being unreliable. This is caused because the method we use to detect sound is not a transducer placed directly on the instrument but rather a microphone which is susceptible to picking up noise, reflections, and harmonics.

This upcoming week, I will be looking to start taking the chunks from the microphone and running them through a Fast Fourier Transform (FFT) and seeing how accurately the program can predict the notes being played. We have already run a C major scale through the FFT and only found the highest C was not being predicted correctly. We believe this one error is manageable but have not tested it under many other conditions.

In terms of testing, the C major scale will continue to act as the baseline mark for testing the quality of notes extracted from a signal. Currently, the Fourier transform is predicting the upper harmonics to be large in amplitude for certain notes, causing some errors in matching to the MIDI. The MIDI acts as ground truth and will be used to analyze how notes are extracted. I will note that this assumes the musician makes no mistakes, which is possible for simple sequences we will be testing on. Another test for the audio portion is to test that a signal is always sent to the front end when the music reaches the end of the page. This will be tested by having a musician play into the microphone and monitoring the front end if the page turns solely based on the audio.

We are currently still on track and will continue to work hard to stay on track.

Team Status Report for 11/4

View Updated Gantt Chart

Our team made progress on the eye tracking subsystem this week. We managed to set up the Tobii Eye Tracker 5, calibrate it accurately with our eyes, and gather data points from the data stream. There was also significant progress on audio. Currently, we are working through the challenging tradeoff of audio quality with storage.

This week, we focused on preparing working subsystems for the interim demo. The most significant risk after that will be integrating different subsystems. We are managing these risks by carefully considering what software environments we build in. For example, having a virtual environment for the frontend code and ensuring that those libraries and installations don’t conflict with known potential conflicts present in Tobii eye tracker software or our audio libraries.

View Eye-Tracker Demo

No major design changes were made to the system – we are on track and proceeding with the components that our design report detailed.

Sanjana’s Status Report for 11/4

This week, I worked with Caleb on setting up and testing some parts of the audio/microphone, attended sessions with Dr. Dueck and her students, and added some non integration related functionality to the frontend.

For the frontend, I had to overcome some challenges regarding setting up a virtual environment and testing that the app displays the same for different laptops and operating systems across my team. Standardization for the display/frontend is important because sheet music is being standardized already. In order to ensure that the size of the music is the same for everyone across different laptops and monitors, the physical distance and pixels on the frontend need to be the same.

Regarding microphone testing, we tested whether or not the Google Board mic was picking up extraneous noise and ensured in the code that the lapel microphone audio was being utilized. I setup Github repositories for the audio and eye tracking code as well as continued developing the frontend which was already on Github.

Dr. Dueck’s sessions have time and time proved to be invaluable sources of information. From this week’s sessions, I got to further explore ideas related to note onset, breath control, and the directionality of sound. I was able to iteratively discover more optimal microphone placement and collected several minutes of usable audio with which to test our algorithms, segmentation, chroma vectors, and note onset detection.

My progress is now on track and the frontend is shaping up well. For next week, I hope to be able to begin integrating some components of either subsystem into the frontend in realtime and work through debugging that. I believe integration is the most challenging task, so I will have to begin attempting some major aspects of integration, possibly before individual components are working completely.