sshriram – Page 3 – Team B6: SoundSync

September 30, 2023October 25, 2023

Team Status Report for 9/30

This week was spent on designing slides for the upcoming design review presentation and finding any possible flaws or risks within the design. The main one that was found last week had to do with the fact that the Tobii SDK, which included a Python API, only supports the pro models of Tobii eye-trackers. The Tobii eye-tracker 5 camera that we are buying, instead, has its API in C++. This can be resolved as the Google Coral Dev Board can run both Python and C. This will just require some extra work to allow the two different languages to communicate seamlessly.

The design has been modified slightly for prototyping purposes. One change made was moving from the tablet as the display to a laptop to make uploading the music and corresponding MIDI file easier. This was ultimately necessary to keep the scope of the project completeable by the end of the semester. By keeping everything local and not deploying a website, the feasibility increases and allows us to spend effort on improving audio and visual components. The cost of such a change is that the user experience will be a little less convenient and streamlined. This, however, does not distract from our project as a whole as we are looking to create a digital page-turner and not a wireless music display device.

The schedule has stayed unchanged and ample progress is being made toward a final project. This includes experimenting with different APIs to filter sound and working on implementing dynamic time warping (DTW).

September 30, 2023October 25, 2023

Sanjana’s Status Report for 9/30

This week, I worked on finalizing the tech stack for our display and how integration with different APIs would work. I collaborated with my team members in order to modify our scope and identify the new constraints and use case of SoundSync. We additionally spent a lot of time understanding and planning different override conditions for our system. For example, if the eye tracking indicates we are elsewhere on the page while the audio indicates we are nearing the last few beats of the last line, we turn the page according to our audio alignment and ignore the misleading eye tracking data. Additionally, a lot of my time went into preparing the slides and information for the design review.

In order to build SoundSync, we are going to deploy a variety of engineering, mathematic, and scientific principles. Our main technologies are centered around machine learning, signal processing, web development, and eye tracking. I have learned these concepts from a series of courses I took over the past 3 years at Carnegie Mellon. While none of the following courses focus on eye tracking or its related APIs, I gained experience using APIs and integrating and interfacing with peripherals through 17-437 and 18-349. 18-453: XR systems, a course I’m taking this semester, defined several of the challenges involved in eye tracking. Here are the notable ones:

10-601 Introduction to Machine Learning: This course provided the fundamentals to explain the math underlying various ML models that we’ll be using.
18-290 Signals and Systems: This course introduced key concepts like convolution, filtering, sampling, and Fourier transforms. We’ll be utilizing these while filtering audio and sampling it. Fourier transforms are going to be an integral part of our audio alignment process using the Dynamic Time Warping algorithm.
18-349 Introduction to Embedded Systems: This course teaches the engineering principles behind embedded realtime systems and covers the integrated hardware and software aspects of embedded processor architectures. I learned how to read through documentation and how to interface with a MCU, skills directly applicable to SoundSync because of the peripherals and board we are using.
17-437 Web Application Development: This course introduces the fundamental architectural elements of programming web sites that produce content dynamically with the Django framework for Python and Java Servlets.

My progress is on schedule. Due to changes we made in the tech stack, I have not begun implementing a React webpage, and am instead familiarizing myself with Python to display our UI. For next week, I will have some pseudocode for displays or eye tracking data collection/filtering. We also intend to order parts in the upcoming week.

September 30, 2023October 25, 2023

Rohan’s Status Report for 9/30

This week I worked on writing and creating slides with Caleb and Sanjana for the upcoming Design Presentation. Concurrently, I looked more into the specifications of the Google Board to double check if the board can handle our intensive processing. Our system needs to be able to process real time audio and align it with our MIDI file using Dynamic Time Working. Our system also needs to process real time eye-tracking coming in from the Tobii Eye Tracker Camera. During our presentation practice with Cynthia, she expressed some concerns about the processing power of our Google Board. After reading over the datasheets and specification documents of the Google Coral Dev Board, I found out that it can perform 4 trillion operations per seconds, using 0.5 watts for each TOPS. I also looked into past computer vision projects based on this board, and this told me that the Google Board will be more than enough as our compute for our system.

Additionally, I did some more research on how to implement Dynamic Time Warping, by looking at existing research based python implementations of this algorithm. Dynamic Time Warping is essential in implementing Audio Alignment for our system. This website that I looked at gave me a great starting understanding of how this algorithm works:

https://builtin.com/data-science/dynamic-time-warping

It goes into great detail about how to accurately segment two audio streams and then perform complex matrix calculations to align the two signals. It even gave some starting python code for some of the math behind this algorithm.

The Classes I have taken to assist my team in building SoundSync:

18-290 Signals & Systems: This class explored signal processing and signal transformations for processing image and sound signals. It delved into the math behind signal processing. This class gave a strong foundation for assisting us build the Dynamic Time Warping algorithm.
18-220 Electronic Devices & Analog Circuits: This class covers many topics within designing analog electrical engineering systems. Specifically, I will be utilizing the skills I picked up about analog signal design and filtering which we will need to properly implement our instrument frequency calibration.
18-349 Introduction to Embedded Systems: This class gave a deep dive introduction to designing the software and hardware for real-time Embedded Systems. It also taught me how to properly manage several concurrent tasks within an Embedded System. This class gave me a good basis for when I need to write the low-level software when programming our Google Coral Dev Board. For example, we need to use many GPIO ports for various buttons, and this class taught me how to program GPIO ports into a microcontroller.

Currently, my progress is on schedule, and I hope to make significantly more progress when our parts come in.

September 30, 2023September 30, 2023

Caleb’s Status Report for 9/30

This week I spent the majority of time preparing for the upcoming presentation. Outside of this, I worked on experimenting with different Python audio APIs. Source separation is a technique designed to separate out different timbres from one large signal. This is most often used to split vocals, drums, guitar, and bass from a song. For our purposes, we would need to split out orchestral sounds which would require a whole new library of sounds and new training. For this reason, this got pushed to post-MVP. Nonetheless, setting up these APIs such as Nussl, and experimenting with what was plottable with matplotlib was very insightful in the difficulty of filtering out a possible very noisy input stream.

I also worked on researching more on how to implement the Tobii eye-tracker 5. The main challenge of using this exact model is that, because it is not a pro model, it is not supported by the pro-SDK which includes a Python API. Instead, this eye-tracker runs using C and runs off a steam engine API that is run through Windows. This, of course, requires additional hardware as the Google Coral Dev board is a Linux-based system. I’ve spent time looking through the Tobii development support website and youtube videos to find examples of how to code for the Tobii eye-tracker with little luck. Although I now have a general idea of how to implement it, many intermediate steps in tutorials and guides were skipped. This means it will require a little more effort to work through any difficulties within the skipped steps.

Classes I have taken that will help build SoundSync include the following:

1. 10-301 Introduction to Machine Learning: This course was important for explaining what is happening behind the scenes for a lot of algorithms that we’ll be implementing. This class also provides insight into how some of the algorithms used for signal processing and eye-tracking filtering can be combined with aspects of K-nearest neighbors to improve performance.

2. 18-290 Signals and Systems: This course is the backbone for all the signal processing we’re going to do. Although this course didn’t explicitly go over the short-time fourier transform (stft), it provided the foundation needed to understand and implement such an algorithm. It will also help in implementing dynamic time warping (DTW) which is pivotal for having the system align the audio.

3. 57-257 Orchestration: Although not an ECE course, this course has been very important for understanding how such a device would operate in a music setting. Learning about various instruments, including ranges and techniques, allows me to understand what are the different techniques musicians might use and how that will affect their experience with our system.

Our schedule is currently still on-track. Despite not having received parts yet, software components that don’t require parts are being worked on.

By next week, depending on if our parts arrive, I hope to have ironed out any possible challenges within the Tobii eye-tracker. This includes possibly finding a complete guide on how to export the information from the eye-tracker without violating any privacy laws as stated by the Tobii eye-tracker terms and conditions. On top of this, any implementation using Nussl to filter out noise from a signal would keep our audio portion of the project moving smoothly.

September 10, 2023October 25, 2023

Team Status Report for 9/16

This week, we researched parts for our design implementation and questioned our product in order to solidify the main goals we are pursuing. We have compiled a parts list and found APIs to integrate with different components of our project.

The most significant risks we are facing are in regard to the robust audio detection and processing algorithm. We want to be able to turn the page based solely on aligning audio inputs to the sheet music being displayed, however having extremely flexible ML models to process this is extremely challenging to execute given our backgrounds. To fix this, we intend to use standardized measured sheet music and pre-processed MIDI files of the music.

We are working on several changes to the design at the moment. Design proposal coming soon!

September 10, 2023October 25, 2023

Sanjana’s Status Report for 9/16

This week, I researched eye tracking algorithms and real-time machine learning audio processing algorithms in order to better quantify requirements for latency, precision, and accuracy for both eye tracking and audio processing.

I also worked on the specifications noted in the design proposal and further refined what open source APIs are available for sheet music processing into MIDI files.

September 10, 2023October 25, 2023

Rohan’s Status Report for 9/16

I worked on the design proposal presentation. I worked on the Use case Requirements slide containing the audio latency and audio accuracy. I also worked on the Technical challenges slide, solution approaches slides, and the testing verification & metrics slide. Lastly, my team and I worked on the User Experience Flow chart and data path for the last slide. I also researched the Jetson Nano Dev Board as a possible substitute for our Google Coral Board. Additionally, I read through the audio alignment sheet music research paper published in 2008 for possible ideas for audio alignment for our project.

September 10, 2023September 15, 2023

Caleb’s Status Report for 9/16

I worked on collecting the specifications needed for the Tobii Eye-Tracker 5. This ensures the camera is operatable by our Google Coral board. Also, downloading the Pro Developer SDK for the camera allows all devices to communicate through Python. I also worked on collecting information on how to filter the eye-tracker data to improve both precision and accuracy. This consisted of reading through a Microsoft paper detailing different filters and their overall effect on eye-tracking accuracy and precision.

Source: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/everyday_eyetracking-1.pdf

September 10, 2023