Rohan’s Status Report for 11/11

This week my team and I worked on our interim demo. Our demo consisted of showcasing our eye-tracking, audio signal capturing and slight-preprocessing, and a test version of the front-end.

After our day 1 demo, I worked on trying to add thresholding to the eye-tracking system, while also helping further implement the front-end. In terms of thresholding, I essentially wrote a script that outputs a stream of 1s or 0s if the user is looking at the last two bars of sheet music. The 1, here represents, a positive page turn signal, and a 0 represents a negative page turn signal. Essentially, if the user is staring at the last 2 bars for a specific threshold time, then it’s time to turn the page from the Eye-tracking side. The threshold time I chose was based off the tempo. The script I wrote ended up working. Now, I need to figure out how to send this data signal to the front-end.

Additionally, I did some front-end work this week. I helped debug and fix a .css style issue for our front-end. Essentially, the .css file was not properly uploading to the HTML, but Sanjana and I fixed this. I also helped Sanjana work on the Music/Midi upload page through django. This is the page where the user needs to upload their Midi file and a pdf of their sheet music. We haven’t finished this HTML page, but we got most of the functionality of it working. Just need to format it and add style. The biggest challenge right now is integrating the three systems together.

In terms of eye-tracking data analysis, I’ve been doing some tinkering. The eye-gaze stream script I wrote prints the user’s eye gaze between 0 and 1. This is because I was using the Tobii SDK where the x,y coordinates of the user’s eyes are represented as a vector between 0 and 1.  I plan to scale this data by 1000 to have more precise measurements and data. One important design case requirement is the eye-tracking latency. The latency must be within 1 beat of a given music’s tempo. I need to make sure the data stream to the front-end must fit this latency. I will have to look into faster sampling rates and refresh rates.

So far I’ve made decent progress, and I am on schedule currently. For next week, I plan to look into web-sockets to try to have all three systems able to communicate with each other. In other words, I need to make sure that the eye-tracking system is able to send data to the Front-end and vice versa.

 

Caleb’s Status Report for 11/11

This week I spent time transitioning the audio component from running off the google board to running off of a Dell XPS 15 laptop. This decision was made as we decided to move away from creating a ML model from scratch and instead use a heuristic for the eye-tracking data. Transitioning to a laptop had some unforeseen difficulties as the microphone was not producing any audio and code written on the google board was not crashing on the laptop. After debugging, the code now runs identically on the laptop as it did on the google board. However, this time spent debugging hindered progress on improving the audio processing. Some progress has been made in segmenting the audio into chunks to later take the fourier transform and compare to the MIDI. However, the robustness of this process is still unknown. We forsee that few errors in note prediction can be handled, but a large number of errors will lead to the whole system being unreliable. This is caused because the method we use to detect sound is not a transducer placed directly on the instrument but rather a microphone which is susceptible to picking up noise, reflections, and harmonics.

This upcoming week, I will be looking to start taking the chunks from the microphone and running them through a Fast Fourier Transform (FFT) and seeing how accurately the program can predict the notes being played. We have already run a C major scale through the FFT and only found the highest C was not being predicted correctly. We believe this one error is manageable but have not tested it under many other conditions.

In terms of testing, the C major scale will continue to act as the baseline mark for testing the quality of notes extracted from a signal. Currently, the Fourier transform is predicting the upper harmonics to be large in amplitude for certain notes, causing some errors in matching to the MIDI. The MIDI acts as ground truth and will be used to analyze how notes are extracted. I will note that this assumes the musician makes no mistakes, which is possible for simple sequences we will be testing on. Another test for the audio portion is to test that a signal is always sent to the front end when the music reaches the end of the page. This will be tested by having a musician play into the microphone and monitoring the front end if the page turns solely based on the audio.

We are currently still on track and will continue to work hard to stay on track.

Team Status Report for 11/4

View Updated Gantt Chart

Our team made progress on the eye tracking subsystem this week. We managed to set up the Tobii Eye Tracker 5, calibrate it accurately with our eyes, and gather data points from the data stream. There was also significant progress on audio. Currently, we are working through the challenging tradeoff of audio quality with storage.

This week, we focused on preparing working subsystems for the interim demo. The most significant risk after that will be integrating different subsystems. We are managing these risks by carefully considering what software environments we build in. For example, having a virtual environment for the frontend code and ensuring that those libraries and installations don’t conflict with known potential conflicts present in Tobii eye tracker software or our audio libraries.

View Eye-Tracker Demo

No major design changes were made to the system – we are on track and proceeding with the components that our design report detailed.

Sanjana’s Status Report for 11/4

This week, I worked with Caleb on setting up and testing some parts of the audio/microphone, attended sessions with Dr. Dueck and her students, and added some non integration related functionality to the frontend.

For the frontend, I had to overcome some challenges regarding setting up a virtual environment and testing that the app displays the same for different laptops and operating systems across my team. Standardization for the display/frontend is important because sheet music is being standardized already. In order to ensure that the size of the music is the same for everyone across different laptops and monitors, the physical distance and pixels on the frontend need to be the same.

Regarding microphone testing, we tested whether or not the Google Board mic was picking up extraneous noise and ensured in the code that the lapel microphone audio was being utilized. I setup Github repositories for the audio and eye tracking code as well as continued developing the frontend which was already on Github.

Dr. Dueck’s sessions have time and time proved to be invaluable sources of information. From this week’s sessions, I got to further explore ideas related to note onset, breath control, and the directionality of sound. I was able to iteratively discover more optimal microphone placement and collected several minutes of usable audio with which to test our algorithms, segmentation, chroma vectors, and note onset detection.

My progress is now on track and the frontend is shaping up well. For next week, I hope to be able to begin integrating some components of either subsystem into the frontend in realtime and work through debugging that. I believe integration is the most challenging task, so I will have to begin attempting some major aspects of integration, possibly before individual components are working completely.

Rohan’s Status Report for 11/4

This week my team and I worked on our interim demo. Our demo consists of showcasing our eye-tracking, audio signal capturing and slight-preprocessing, and a test version of the front-end.

I worked on setting up the tobii eye-tracking camera and writing and testing a simple eye-tracking script. Setting up the camera required: installing the Tobii Experience application, setting up the display, and setting up calibration. After executing these steps, the application allowed a pre-gaze feature, which is a spotlight circle that moves around the screen according to where the user is looking at the screen. The test script displays the user’s current eye-gaze position on a windows terminal. This can be clearly seen in this video we shot.

Here, the coordinate system of the screen is (0,0) for top left corner, (1,0) for top right corner, (1,1) for bottom right corner, (0,1) is the bottom left corner.

So far I’ve made decent progress in terms of the eye-tracking, and I am on schedule currently. For next week, I plan to write more scripts to integrate the front-end with the eye-tracking, such as looking at the bottom right section of the homepage redirects to a different html view. I will also look into finding a way to continuously send eye-gaze position to the google board in real time.

Caleb’s Status Report for 11/4

This week I spent time working with the google board to have it take in input from the microphone and parse it into chunks that can be processed. This proved to be a bit more difficult because the google board OS did not seem to recognize the lapel microphone as an input. After working with it in the terminal, the microphone is now an unnamed input object that acts as the default microphone. On top of this, because the google board does not have a speaker, I spent time making a way to play the audio files from the google board. This process involves having the file pushed to a git repository and downloaded on a machine with a speaker (i.e my computer). I also spent time trying to adjust the settings for the recording stream to optimize the quality of audio we can get as well as testing different recording conditions to see what kind of noise is also picked up by the microphone. The last thing to do with the microphone is to see what format gives the easiest to work with audio files. For example, a 1MB audio file, although is high quality, will take longer to process and may cause us to not meet the latency requirement.

This upcoming week, I will be looking at different libraries to see which runs faster on the google board. The current library we are using is librosa but might turn out to run slower than PyTorch. In the worst case scenario, we are able to code up the dynamic time warping algorithm from first principles, but it is unlikely to run faster than the prewritten ones.

We are currently still on track and will continue to work hard to stay on track.

Team Status Report for 10/28

We successfully registered and connected the Google Board to CMU_SECURE Wi-Fi, cloned the pycoral Github repository, and ran these tutorials on the board. We ran an inference on the Edge TPU using TensorFlow Lite, which gives us a good starting point to perform more machine learning data processing in the future.

For the audio component, a challenge we overcame was reading in audio from the headphone jack. We connected the lapel microphone onto the Google Board and used it to read in voice commands as a first step to test microphone connection and processing capabilities. Having this connection is vital and failure to extract this audio data could jeopardize the entire project. This is because it has been proved that page flipping can be done with just audio and just eye-tracking would not be reliable enough. Therefore, we will continue to ensure that the microphone stays connected at all times while still being comfortable for the player. A backup will be to use the microphone on the google board, even though the sound quality is likely to be significantly worse. This way, the system will never have no audio input which would lead to the system completely failing.

No changes have been made to the design or the schedule.

Sanjana’s Status Report for 10/28

The main task of this week was getting audio inputs to work at some baseline level. Caleb and I worked on getting audio inputs registered. While trying to get this working, I discovered alternatives for accessing or controlling headphone jacks we could use in the future like ALSA utilities and jackaudio.org. This week, we used PyAudio in combination with this code from Google to run an ML model for voice recognition. The main purpose of this was to test whether the mic was being used as the audio input source and whether it was picking up sound at a high enough resolution for the ML model to process. View video of demo. Next steps in audio involve getting chroma vectors and note onsets from recordings of me playing violin and displaying that in real time.

I also created a GitHub repository for the team’s code. It currently just hosts frontend code. Next steps involve integration with the audio code.

My progress is on schedule, and I will continue to keep working hard to stay on track. Some deliverables for the upcoming week are creating audio samples for testing the microphone some more with violin. I also want to keep testing how the note onsets are detected for the violin and calculate thresholds for onset detection. I also want to keep working on implementing the frontend to optimize for the user experience.

Rohan’s Status Report for 10/28

This week I mainly worked on finishing up setting the Google Board and starting work on the Eye-Tracking. We got our Eye-Tracking camera this week and the SDK software tools to use it. I spent most of the week looking at tutorials for different Tobii Eye-Tracking functions. I was following the C++ tutorials. I still need to spend more  time to learn this API, before proper implementation can start.

As a team, we finally connected the Google board to wifi and were able to test some onboarding Git code. It was an image processing program for a picture of a parrot. It worked!!

In terms of progress, I am on track and on schedule. Next week, I plan to implement a starting version of eye-tracking.

Caleb’s Status Report for 10/28

This week I spent time familiarizing myself with how to manipulate audio files within Python. For example, using pyaudio to find the onset of notes and the chroma vector. I’ve also reviewed how to implement harmonic-percussive separation in Python. This function is of particular importance because a practice room environment is very likely to have percussive noise in the background. For instance, during our recording session with Dr. Dueck, noises such as a truck driving down the nearby road or someone passing by with a ring of keys jingling were all noises that affected our recording. Because we want to perform time warping with a perfect, no-noise MIDI file, we want to remove this excess noise.

The Google board was successfully set up and connected to the internet. However, a new challenge is getting the board to detect the Shure lapel microphone. The Google board does have a built-in pulse-density modulation (PDM) microphone. However, there microphone’s quality is significantly worse than the lapel microphone which would lead to worse time-warping. Also the Google board does not have the mobility of the lapel microphone and cannot be placed prime locations for picking up breaths and music. This makes detecting the microphone on the Google board a crucial step. This may involve adding additional drivers.

This upcoming week, I look forward to continuing to use sound samples collected through Dr. Dueck’s class to perform various audio filters. This week I am interested in how to take the audio data and turn in it into a vector that can be used for the ML model. Because the eye-tracking will change depending on where the user is on the page, we want to take the data of where the user is on the page and help the eye-tracking make better predictions.

We are currently still on track and will continue to work hard to stay on track.