Team Status Report for 9/30

This week was spent on designing slides for the upcoming design review presentation and finding any possible flaws or risks within the design. The main one that was found last week had to do with the fact that the Tobii SDK, which included a Python API, only supports the pro models of Tobii eye-trackers. The Tobii eye-tracker 5 camera that we are buying, instead, has its API in C++. This can be resolved as the Google Coral Dev Board can run both Python and C. This will just require some extra work to allow the two different languages to communicate seamlessly.

The design has been modified slightly for prototyping purposes. One change made was moving from the tablet as the display to a laptop to make uploading the music and corresponding MIDI file easier. This was ultimately necessary to keep the scope of the project completeable by the end of the semester. By keeping everything local and not deploying a website, the feasibility increases and allows us to spend effort on improving audio and visual components. The cost of such a change is that the user experience will be a little less convenient and streamlined. This, however, does not distract from our project as a whole as we are looking to create a digital page-turner and not a wireless music display device.

The schedule has stayed unchanged and ample progress is being made toward a final project. This includes experimenting with different APIs to filter sound and working on implementing dynamic time warping (DTW).

Sanjana’s Status Report for 9/30

This week, I worked on finalizing the tech stack for our display and how integration with different APIs would work. I collaborated with my team members in order to modify our scope and identify the new constraints and use case of SoundSync. We additionally spent a lot of time understanding and planning different override conditions for our system. For example, if the eye tracking indicates we are elsewhere on the page while the audio indicates we are nearing the last few beats of the last line, we turn the page according to our audio alignment and ignore the misleading eye tracking data. Additionally, a lot of my time went into preparing the slides and information for the design review.

In order to build SoundSync, we are going to deploy a variety of engineering, mathematic, and scientific principles. Our main technologies are centered around machine learning, signal processing, web development, and eye tracking. I have learned these concepts from a series of courses I took over the past 3 years at Carnegie Mellon. While none of the following courses focus on eye tracking or its related APIs, I gained experience using APIs and integrating and interfacing with peripherals through 17-437 and 18-349. 18-453: XR systems, a course I’m taking this semester, defined several of the challenges involved in eye tracking. Here are the notable ones:

  1. 10-601 Introduction to Machine Learning: This course provided the fundamentals to explain the math underlying various ML models that we’ll be using.
  2. 18-290 Signals and Systems: This course introduced key concepts like convolution, filtering, sampling, and Fourier transforms. We’ll be utilizing these while filtering audio and sampling it. Fourier transforms are going to be an integral part of our audio alignment process using the Dynamic Time Warping algorithm.
  3. 18-349 Introduction to Embedded Systems: This course teaches the engineering principles behind embedded realtime systems and covers the integrated hardware and software aspects of embedded processor architectures. I learned how to read through documentation and how to interface with a MCU, skills directly applicable to SoundSync because of the peripherals and board we are using.
  4. 17-437 Web Application Development: This course introduces the fundamental architectural elements of programming web sites that produce content dynamically with the Django framework for Python and Java Servlets.

My progress is on schedule. Due to changes we made in the tech stack, I have not begun implementing a React webpage, and am instead familiarizing myself with Python to display our UI. For next week, I will have some pseudocode for displays or eye tracking data collection/filtering. We also intend to order parts in the upcoming week.

Rohan’s Status Report for 9/30

This week I worked on writing and creating slides with Caleb and Sanjana for the upcoming Design Presentation. Concurrently, I looked more into the specifications of the Google Board to double check if the board can handle our intensive processing. Our system needs to be able to process real time audio and align it with our MIDI file using Dynamic Time Working. Our system also needs to process real time eye-tracking coming in from the Tobii Eye Tracker Camera. During our presentation practice with Cynthia, she expressed some concerns about the processing power of our Google Board. After reading over the datasheets and specification documents of the Google Coral Dev Board, I found out that it can perform 4 trillion operations per seconds, using 0.5 watts for each TOPS. I also looked into past computer vision projects based on this board, and this told me that the Google Board will be more than enough as our compute for our system.

 

Additionally, I did some more research on how to implement Dynamic Time Warping, by looking at existing research based python implementations of this algorithm. Dynamic Time Warping is essential in implementing Audio Alignment for our system. This website that I looked at gave me a great starting understanding of how this algorithm works:

https://builtin.com/data-science/dynamic-time-warping

It goes into great detail about how to accurately segment two audio streams and then perform complex matrix calculations to align the two signals. It even gave some starting python code for some of the math behind this algorithm.

 

The Classes I have taken to assist my team in building SoundSync:

 

  1. 18-290 Signals & Systems: This class explored signal processing and signal transformations for processing image and sound signals. It delved into the math behind signal processing. This class gave a strong foundation for assisting us build the Dynamic Time Warping algorithm.
  2. 18-220 Electronic Devices & Analog Circuits: This class covers many topics within designing analog electrical engineering systems. Specifically, I will be utilizing the skills I picked up about analog signal design and filtering which we will need to properly implement our instrument frequency calibration.
  3. 18-349 Introduction to Embedded Systems: This class gave a deep dive introduction to designing the software and hardware for real-time Embedded Systems. It also taught me how to properly manage several concurrent tasks within an Embedded System. This class gave me a good basis for when I need to write the low-level software when programming our Google Coral Dev Board. For example, we need to use many GPIO ports for various buttons, and this class taught me how to program GPIO ports into a microcontroller.

 

Currently, my progress is on schedule, and I hope to make significantly more progress when our parts come in.

Caleb’s Status Report for 9/30

This week I spent the majority of time preparing for the upcoming presentation. Outside of this, I worked on experimenting with different Python audio APIs. Source separation is a technique designed to separate out different timbres from one large signal. This is most often used to split vocals, drums, guitar, and bass from a song. For our purposes, we would need to split out orchestral sounds which would require a whole new library of sounds and new training. For this reason, this got pushed to post-MVP. Nonetheless, setting up these APIs such as Nussl, and experimenting with what was plottable with matplotlib was very insightful in the difficulty of filtering out a possible very noisy input stream.

I also worked on researching more on how to implement the Tobii eye-tracker 5. The main challenge of using this exact model is that, because it is not a pro model, it is not supported by the pro-SDK which includes a Python API. Instead, this eye-tracker runs using C and runs off a steam engine API that is run through Windows. This, of course, requires additional hardware as the Google Coral Dev board is a Linux-based system. I’ve spent time looking through the Tobii development support website and youtube videos to find examples of how to code for the Tobii eye-tracker with little luck. Although I now have a general idea of how to implement it, many intermediate steps in tutorials and guides were skipped. This means it will require a little more effort to work through any difficulties within the skipped steps.

Classes I have taken that will help build SoundSync include the following:

1. 10-301 Introduction to Machine Learning: This course was important for explaining what is happening behind the scenes for a lot of algorithms that we’ll be implementing. This class also provides insight into how some of the algorithms used for signal processing and eye-tracking filtering can be combined with aspects of K-nearest neighbors to improve performance.

2. 18-290 Signals and Systems: This course is the backbone for all the signal processing we’re going to do. Although this course didn’t explicitly go over the short-time fourier transform (stft), it provided the foundation needed to understand and implement such an algorithm. It will also help in implementing dynamic time warping (DTW) which is pivotal for having the system align the audio.

3. 57-257 Orchestration: Although not an ECE course, this course has been very important for understanding how such a device would operate in a music setting. Learning about various instruments, including ranges and techniques, allows me to understand what are the different techniques musicians might use and how that will affect their experience with our system.

Our schedule is currently still on-track. Despite not having received parts yet, software components that don’t require parts are being worked on.

By next week, depending on if our parts arrive, I hope to have ironed out any possible challenges within the Tobii eye-tracker. This includes possibly finding a complete guide on how to export the information from the eye-tracker without violating any privacy laws as stated by the Tobii eye-tracker terms and conditions. On top of this, any implementation using Nussl to filter out noise from a signal would keep our audio portion of the project moving smoothly.

 

Team Status Report for 9/23

This week, we researched more parts for our power requirements of our system.  We looked into USBC Male to Female Jumper cables for our system and looked at different battery packs with different power budgets.

As we continued researching, several risks emerged. Our original design planned to account for tempos up to 180 BPM. Feedback from instructors, however, indicated that it may be overly ambitious to attempt to build a completely robust audio filtering system at 180 BPM. 

Our system design and priorities are also evolving. Since eye tracking will serve as the foundation of our page turning, adding another foundational technology – like audio alignment – to our system may be delegated to post-MVP additions. We are continuing to look into the audio alignment portion of our system, and plan to decide before next week whether this design change is one we want to follow through with.

This upcoming Monday, we are meeting with Dr. Dueck who has taken a keen interest in our project. We intend to discuss our proposed design, gain a better perspective regarding our use case, and understand how we will be collaborating with her.

In terms of welfare, we purposely designed this system to be fully operational with just the user’s eyes. This allows people who cannot operate a foot pedal, like those who may be paralyzed below the waist, to also not have to deal with flipping a page during a performance. Although the foot pedal system is cheap and simple, this simplicity ultimately excludes a percentage of musicians, which we view as unfair. Therefore, the goal of our system is to include those who were excluded by a solution that failed to account for a significant percentage of musicians.

Sanjana’s Status Report for 9/23

This week, I researched platforms for hosting our UI and looked into Tobii Eye Tracker 5 integration and setup.

We intended to use the Tobii Eye Tracker 5 Camera for eye tracking on a digital page, however I discovered some issues that may arise with integration. This eye tracker model isn’t compatible with MacOS, which is my development OS. I looked into 4 other solutions:

  1. Tobii Pro SDK can give us raw data and is mac compatible, but is incompatible with the Tobii eye tracker 5. All compatible devices aren’t in budget.
  2. The iPad true depth camera. I found an open source project that investigated eye tracking using the TrueDepth camera on an iPhone X. This will be challenging because I don’t have experience with Swift programming, and we are limited by tablets that already have a TrueDepth camera. This eye tracking may not be robust enough for our intended use, however, this solution remains accessible to musicians who can’t operate a foot pedal during performances.
  3. We can pivot to face tracking rather than eye tracking. Now, several open source APIs become available, and I can build out a real-time iOS app in Swift. The potential problem with this approach is that users will have to use their head to turn the page, which could result in a loss of focus. However, this solution maintains accessibility.
  4. The eye tracking community recommends Talon Voice’s software support for MacOS development. Here, there is good support for eye tracking, and head tracking is used to further refine miscalculations.

Regarding the UI, there are a couple of options: native iPad apps, or building and deploying a web application on a tablet. Mobile/native apps can provide benefits such as being faster and more efficient, which could have a huge impact on our real time system. Ultimately, this decision rests on which platform we collect eye data from.

Progress is on track as of now. For next week, I hope to finalize the tech stack and parts list I’ll be using and begin implementation of the UI. 

Rohan’s Status Report for 9/23

This week I prepared for and presented the Project Proposal Presentation for my team.  After the presentation, I spent some time researching different parts that we need for our project. I also added it to our budget of parts list. I looked into cheap USB-C male to female jumper cables. I also looked into cheap but powerful battery pack for our system.

Caleb’s Status Report for 9/23

While waiting for parts this week, I spent time looking into audio separation APIs within Python. Although several existed that seemed to do the job, the one that looked most user-friendly was Nussl. This API has both plotting capabilities for the signals as well as built-in methods to complete short time Fourier transforms (stft). Because to execute the dynamic time warping, we’ll be breaking the signal into 46ms segments, this transform should be really helpful. Something to note is that a vast majority of source separation software is built for separating out the instruments of a 4 part band: vocals, guitar, bass, and drums. Because this stand is envisioned for a concert or orchestra setting, the source separation needs to be able to separate out concert instruments.

Nussl does have a feature where the model can be trained on input data. However, to limit the scope of this and not have to sample every instrument that may occur in a concert setting, we’ll be just focusing on extracting a violin part from a recording. In regards to separating out two violins playing different parts, because notes don’t “cancel-out”, dynamic time warping will still be able to align using a violin part that’s a subset of the notes being played.

 

Team Status Report for 9/16

This week, we researched parts for our design implementation and questioned our product in order to solidify the main goals we are pursuing. We have compiled a parts list and found APIs to integrate with different components of our project.

The most significant risks we are facing are in regard to the robust audio detection and processing algorithm. We want to be able to turn the page based solely on aligning audio inputs to the sheet music being displayed, however having extremely flexible ML models to process this is extremely challenging to execute given our backgrounds. To fix this, we intend to use standardized measured sheet music and pre-processed MIDI files of the music.

We are working on several changes to the design at the moment. Design proposal coming soon!

Sanjana’s Status Report for 9/16

This week, I researched eye tracking algorithms and real-time machine learning audio processing algorithms in order to better quantify requirements for latency, precision, and accuracy for both eye tracking and audio processing.

I also worked on the specifications noted in the design proposal and further refined what open source APIs are available for sheet music processing into MIDI files.