This week I spent the majority of time preparing for the upcoming presentation. Outside of this, I worked on experimenting with different Python audio APIs. Source separation is a technique designed to separate out different timbres from one large signal. This is most often used to split vocals, drums, guitar, and bass from a song. For our purposes, we would need to split out orchestral sounds which would require a whole new library of sounds and new training. For this reason, this got pushed to post-MVP. Nonetheless, setting up these APIs such as Nussl, and experimenting with what was plottable with matplotlib was very insightful in the difficulty of filtering out a possible very noisy input stream.
I also worked on researching more on how to implement the Tobii eye-tracker 5. The main challenge of using this exact model is that, because it is not a pro model, it is not supported by the pro-SDK which includes a Python API. Instead, this eye-tracker runs using C and runs off a steam engine API that is run through Windows. This, of course, requires additional hardware as the Google Coral Dev board is a Linux-based system. I’ve spent time looking through the Tobii development support website and youtube videos to find examples of how to code for the Tobii eye-tracker with little luck. Although I now have a general idea of how to implement it, many intermediate steps in tutorials and guides were skipped. This means it will require a little more effort to work through any difficulties within the skipped steps.
Classes I have taken that will help build SoundSync include the following:
1. 10-301 Introduction to Machine Learning: This course was important for explaining what is happening behind the scenes for a lot of algorithms that we’ll be implementing. This class also provides insight into how some of the algorithms used for signal processing and eye-tracking filtering can be combined with aspects of K-nearest neighbors to improve performance.
2. 18-290 Signals and Systems: This course is the backbone for all the signal processing we’re going to do. Although this course didn’t explicitly go over the short-time fourier transform (stft), it provided the foundation needed to understand and implement such an algorithm. It will also help in implementing dynamic time warping (DTW) which is pivotal for having the system align the audio.
3. 57-257 Orchestration: Although not an ECE course, this course has been very important for understanding how such a device would operate in a music setting. Learning about various instruments, including ranges and techniques, allows me to understand what are the different techniques musicians might use and how that will affect their experience with our system.
Our schedule is currently still on-track. Despite not having received parts yet, software components that don’t require parts are being worked on.
By next week, depending on if our parts arrive, I hope to have ironed out any possible challenges within the Tobii eye-tracker. This includes possibly finding a complete guide on how to export the information from the eye-tracker without violating any privacy laws as stated by the Tobii eye-tracker terms and conditions. On top of this, any implementation using Nussl to filter out noise from a signal would keep our audio portion of the project moving smoothly.