Team Status Report for 10/28

We successfully registered and connected the Google Board to CMU_SECURE Wi-Fi, cloned the pycoral Github repository, and ran these tutorials on the board. We ran an inference on the Edge TPU using TensorFlow Lite, which gives us a good starting point to perform more machine learning data processing in the future.

For the audio component, a challenge we overcame was reading in audio from the headphone jack. We connected the lapel microphone onto the Google Board and used it to read in voice commands as a first step to test microphone connection and processing capabilities. Having this connection is vital and failure to extract this audio data could jeopardize the entire project. This is because it has been proved that page flipping can be done with just audio and just eye-tracking would not be reliable enough. Therefore, we will continue to ensure that the microphone stays connected at all times while still being comfortable for the player. A backup will be to use the microphone on the google board, even though the sound quality is likely to be significantly worse. This way, the system will never have no audio input which would lead to the system completely failing.

No changes have been made to the design or the schedule.

Sanjana’s Status Report for 10/28

The main task of this week was getting audio inputs to work at some baseline level. Caleb and I worked on getting audio inputs registered. While trying to get this working, I discovered alternatives for accessing or controlling headphone jacks we could use in the future like ALSA utilities and jackaudio.org. This week, we used PyAudio in combination with this code from Google to run an ML model for voice recognition. The main purpose of this was to test whether the mic was being used as the audio input source and whether it was picking up sound at a high enough resolution for the ML model to process. View video of demo. Next steps in audio involve getting chroma vectors and note onsets from recordings of me playing violin and displaying that in real time.

I also created a GitHub repository for the team’s code. It currently just hosts frontend code. Next steps involve integration with the audio code.

My progress is on schedule, and I will continue to keep working hard to stay on track. Some deliverables for the upcoming week are creating audio samples for testing the microphone some more with violin. I also want to keep testing how the note onsets are detected for the violin and calculate thresholds for onset detection. I also want to keep working on implementing the frontend to optimize for the user experience.

Rohan’s Status Report for 10/28

This week I mainly worked on finishing up setting the Google Board and starting work on the Eye-Tracking. We got our Eye-Tracking camera this week and the SDK software tools to use it. I spent most of the week looking at tutorials for different Tobii Eye-Tracking functions. I was following the C++ tutorials. I still need to spend more  time to learn this API, before proper implementation can start.

As a team, we finally connected the Google board to wifi and were able to test some onboarding Git code. It was an image processing program for a picture of a parrot. It worked!!

In terms of progress, I am on track and on schedule. Next week, I plan to implement a starting version of eye-tracking.

Caleb’s Status Report for 10/28

This week I spent time familiarizing myself with how to manipulate audio files within Python. For example, using pyaudio to find the onset of notes and the chroma vector. I’ve also reviewed how to implement harmonic-percussive separation in Python. This function is of particular importance because a practice room environment is very likely to have percussive noise in the background. For instance, during our recording session with Dr. Dueck, noises such as a truck driving down the nearby road or someone passing by with a ring of keys jingling were all noises that affected our recording. Because we want to perform time warping with a perfect, no-noise MIDI file, we want to remove this excess noise.

The Google board was successfully set up and connected to the internet. However, a new challenge is getting the board to detect the Shure lapel microphone. The Google board does have a built-in pulse-density modulation (PDM) microphone. However, there microphone’s quality is significantly worse than the lapel microphone which would lead to worse time-warping. Also the Google board does not have the mobility of the lapel microphone and cannot be placed prime locations for picking up breaths and music. This makes detecting the microphone on the Google board a crucial step. This may involve adding additional drivers.

This upcoming week, I look forward to continuing to use sound samples collected through Dr. Dueck’s class to perform various audio filters. This week I am interested in how to take the audio data and turn in it into a vector that can be used for the ML model. Because the eye-tracking will change depending on where the user is on the page, we want to take the data of where the user is on the page and help the eye-tracking make better predictions.

We are currently still on track and will continue to work hard to stay on track.

Team Status Report for 10/21

Over the course of two weeks, we worked on our Design Report, met with Dr. Roger Dannenberg, and began initial setup of the Google Board. After discussing Dr. Dannenberg’s past work in audio alignment, we realized that audio alignment can be incredibly more robust than the use of eye-tracking. Dr. Dannenberg worked on a project in the 1980s where lines of sheet music would display on a computer screen to match the user was currently playing. There was no page turning here, but it is pretty similar to what we are trying to achieve. In this project, Dr. Dannenberg only used audio alignment to figure out where the user is currently located and to display the correct corresponding lines of music. Dr. Dannenberg showed us an old video demonstration of his system, and it was incredibly accurate. This showed us that the addition of eye-tracking could or could not have great significance on our system performance. This new information potentially changes the use of Eye-Tracking. For example, we planned to use head-tracking for override cases, such as the user turning their head to the right to indicate to flip the page to the right. However, it is too early to foresee how impactful the Eye-Tracking component will be. We still plan to proceed with implementing Eye-Tracking for our system just to measure how much improvement there will be in the  system performance.

Currently, there are no changes that have been made for the design. However, our conversation with Dr. Dannenberg gave us confidence in using audio alignment as our main form of tracking the user’s performance.  Additionally, we finally got our hands on a Google Coral Dev Board, and we have successfully flashed this board.

Sanjana’s Status Report for 10/21

This week, I worked with my team to flash the Google Board. We followed the directions on this site, however we are behind schedule in terms of getting the board operating with some basic functionality. We weren’t able to get the board connected to Wi-Fi, therefore we’re looking into a couple options: continue researching Wi-Fi connectivity and how feasible that is, or connect via the Google Board’s ethernet port. My main concern about an ethernet port is that it could violate the use case requirement of portability and simple hardware setup.

The majority of my hours worked this week went into preparing the Design Report on time. The Design Report involved a lot of additional research and helped us better understand the scope and technical requirements of our project.

Overall, progress is slightly behind schedule. I am planning on catching up this week by working on the frontend experience flow and figuring out how the pages will be navigated locally. These deliverables should allow me flexibility in terms of integrating the audio and visual components into the display over the next couple of weeks. Furthermore, I will be working more with the Google Board to help setup internet connection and a GitHub repo with all our code.

In order to implement my portion of this project, I’m looking into learning some new tools. The new tools I need to study are Python display libraries: PyQt5 and Tkinter being the top 2. I intend to follow some tutorials to better understand the advantages and disadvantages of each approach and then finish implementation. One roadblock I’m facing is file organization. There are several sub-systems in SoundSync – frontend, backend, audio alignment, and eye tracker processing. We have some Google Board starter code and will be running into integration issues and compatibility between different Python libraries in the near future – it’ll be my job to debug those.

Rohan’s Status Report for 10/21

The past two weeks involved writing the Design Report, MIDI audio alignment, and setting up the Google Coral Dev Board. For the Design Report, I worked on writing the introduction, some of the Use-Case Requirements, some of the Design Requirements, some of the Trade-Off Sections, the Gantt Chart, and  the Summary. After finishing the Design Report as a team, we started work on the Google Coral Dev Board. I aided in the effort to flash the board, and test some rudimentary programs for the board. On the side, I looked into MIDI audio alignment programs to work on the audio alignment of our system.

For the MIDI audio alignment, I looked into possible tools I could use to learn how to properly implement Dynamic Time Warping with a MIDI File. I looked into this website: https://www.musanim.com/wavalign/ , for some guidance. This website talked about using FFTW, which is a C subroutine library for computing the discrete Fourier transform in one or more dimensions, of arbitrary input size, and of both real and complex data. I also looked at PyAudio tools and dtw-python, which is a pyhton library for Dynamic Time Warping. For setting up the Google Coral Board, I mainly looked at the Google Board Dev Starter website and documentation: https://coral.ai/docs/dev-board/get-started/.

So far, my progress is on schedule, and next week I plan to finish setting up the Google Coral Dev Board, and test some of the on-boarding features.

Caleb’s Status Report for 10/21

This week I spent time researching how to implement the given audio functions in Python. More importantly, the python functions must be able to run completely remotely on the Google Coral board. Setting up the board came with some complications which will be discussed later in this post. Implementing these functions for the Google Board means the board must have all the data necessary when the function is called. This means ensuring the segment of live audio is already segmented and stored on the Google board. This turns out to be more tricky than anticipated simply because the board needs an easy way to pull information from all the different ports connected to it.

Setting up the Google board was the major unforeseen complication as the instructions listed upon the Google website for setting up did not seem to work too well. One example is connecting to wifi. Unfortunately, CMU’s wifi login requires a username and password. The board expects for the given wifi to work by simply connecting and inserting the correct password. Therefore, this eliminates any CMU wifi which requires a login. CMU does have password free wifis. However, these wifis have additional security in place which seemed to identify our Google Board as performing unsafe actions and shut off internet connection. This made pip installing or checking for updates impossible using CMU wifi. Furthermore, any hotspot from a mobile device was not detected by the board.

One of the tools I’m looking to learn is how to program for the Tobii Eye Tracker 5. This camera has capabilities spanning from head tracking to precise eye tracking. Understanding how to not only extract this information but also be able to relay it back to the Google Board is an important and challenging task. I also am looking to learn how to create uniform sheet music in Musescore. Musescore is a music writing application which does give the user the ability to customize spacing and notes in a score. However, I still need to learn how to utilize all the knobs to create sheet music that is both readable and uniform.

This upcoming week, I look forward to using sound samples collected through Dr. Dueck’s class to perform various audio filters. I am most interested in seeing if after performing a harmonic-percussive separation if breathing is separated from other percussive sounds and can be detected that way. I also am looking into setting up the Tobii camera so that it is compatible with the google board and so there is communication between the two.

We are currently still on track and will continue to work hard to stay on track.

Team Status Report for 10/7

Although we have a final design that was presented in the design review presentation, we have some concerns about the quality of our system. We obviously can’t know the performance of dynamic time warping (DTW) or the model until they are built. However, there is the risk that even for tempos that are quite modest such as 120bpm, the processing delay or DTW takes more than 500ms. Importantly, both the ML model and DTW have to stay below 500ms. This is because the ML model, which is associated with eye-tracking, and DTW, which is associated with audio alignment, run in parallel. This might need to be mitigated by reducing the max tempo of the pieces that can be aligned which also increases the maximum delay for both DTW and the ML model. This would, in turn, limit the scope to just beginner music instead of beginner and intermediate-level music.

Currently, no changes have been made to the design. However, we are meeting with Dr. Dannenberg who is an expert in the field and has done projects similar to one we are looking to create. We hope this conversation gives us insight into problems we can’t foresee and we will adjust the design appropriately.

Several principles of engineering implemented for this project include user-centered design, robustness, and power management. User-centered design is a main focus because we aim to include a wide range of musicians. Robustness is an important principle as we want to handle all the variations in user inputs. Power management is another principle that keeps the system operable for the duration of a rehearsal and does not inconvenience the user.

Sanjana’s Status Report for 10/7

I began coding a user interface in Python with basic functionality – to upload a sheet music file/MIDI file and pick an instrument. This will later be integrated with the backend. My team and I spent considerable time on the design review report as well. We also ordered parts and read several of Dr. Dannenberg’s past papers. One major technical takeaway from this was on the subject on spectral analysis and chroma vectors. One paper discussed how chroma vectors reduce detailed spectra to 12-element vectors, each representing energy associated with one of the 12 pitch classes (there are 7 notes and 5 unique half steps between notes). Comparing these chroma vectors provides a robust distance metric for score alignment, which is one of our biggest challenges. I also looked into how segmentation, along with state graphs and penalty functions, could be used to handle complex realtime audio data.

With the exception of eye tracking, I am on schedule. The eye tracker was ordered this week, so I expect to begin collecting data from it in the next 2 weeks. As individual components are completed, they will be continuously integrated with my frontend.

My deliverables for next week are a frontend that’s able to display some data from either the microphone or camera.