Ben Solo’s Status Report for 11/9

This week myself and the rest of the group spent basically all of our time integrating each of the components we’ve been working on into one unified system. Aside from the integration work, I made a few changes to the controller handles two drumsticks as opposed to 1, altered the way we handle detecting the drum pads at the start of the session, and cut out the actual rubber drum pads to their specific diameters for testing. Prior to this week, we had the following separate components:
1.) A BLE system capable of transmitting accelerometer data to the paired laptop
2.) A dedicated CV module for detecting the drum rings at the start of the playing session. This function was triggered by clicking a button on the webapp, which used an API to initiate the detection process.
3.) A CV module responsible for continually tracking the tip of a drumstick and storing the predicted drum pad the tip was on for the 20 most recent frames.
4.) An audio playback module responsible for quickly playing audios on detected impacts.

We split our integration process into two steps; the first was to connect the BLE/accelerometer code to the audio playback module, omitting the object tracking. To do this, Elliot had to change some of the BLE module so it could successfully be used in our system controller, and I needed to change the way we were previously reading in accelerometer data in the system controller. I was under the impression that the accelerometer/ESP32 system would continuously transmit the accelerometer data, regardless of whether any acceleration was occurring (i.e. transmit 0 acceleration if not accelerating). However in reality, the system only sends data when acceleration is detected. Thus, I changed the system controller to read a globally set acceleration variable from the Bluetooth module every iteration of the while loop, and then compare this to the predetermined acceleration threshold to decide whether an impact has occurred of not. After Elliot and I completed the necessary changes for integration, we tested the partially integrated system by swinging the accelerometer around to trigger an impact event, then assigning a random index [1,4] (since we hadn’t integrated the object tracking module yet), and playing the according sound. The system functioned very well with surprisingly low latency.

The second step in the integration process was to combine the partially integrated accelerometer/BLE/Playback system with the object tracking code. This again required me to change how the system controller worked. Because Belle’s code needs to independently run continuously and populate our 20 frame buffer of predicted drum pads, we needed a new thread for each drum stick that starts as soon as the session begins. The object tracking code treated drum pad metadata as an array of length 4 of tuples in the form (x, y, r). I was storing drum pad meta data (x, y, r) in a dictionary where each value was associated with a key. Thus, I changed the way we this information to coincide with Belle’s code. At this point, we combined all the logic needed for 1 drumstick’s operation and proceeded to testing. Though obviously it didn’t work on the first try, after a few further modifications and changes, we were successful in producing a system the tracks the drumstick’s location, transmits accelerometer data to the laptop, and plays the corresponding sound of a drum pad when an impact occurs. This was a huge step in our projects progression, as we have a basic, working version of what we proposed to do, all while maintaining a low latency (measuring exactly what the latency is was difficult since its sound based, but just from using it, its clear that the current latency is far below 100ms).

Outside of this integration process, I also started o think about and work on how we would handle two drumsticks as opposed to 1, which we already had working. The key realization were that we need to CV threads to continuously and independently track the location of each drum stick. We would also need two BLE threads, one for each drum sticks acceleration transmission. Lastly, we would need two threads running the system controller code which handles reading in acceleration data, identifying what drum pad the stick was in during an impact, and triggering the audio playback. Though we haven’t yet tested the system with two drum sticks, the system controller is now set up so that once we do want to test it, we can easily spawn corresponding threads for the second drum stick. This involved re-writing the functions to case on the color of each drum sticks tip. This is primarily needed because the object tracking module needs to know which drum stick to track, but is also used in the BLE code to store acceleration data for each stick independently.

Lastly, I spent some time carefully cutting out the drum pads from the rubber sheets at the diameters 15.23, 17.78, 20.32, 22.86 (cm) so we could proceed with testing. Below is an image of the whole setup including the camera stand, webcam, drum pads, and drum sticks.

We are definitely on schedule and hope to continue progressing at this rate for the next few weeks. Next week, I’d like to do two things: 1.) I want to refine the overall system, making sure we have accurate acceleration thresholds and assigning the correct sounds to the correct drum pads from the webapp, and 2.) testing the system with two drum sticks at once. The only worry we have is that since we’ll have two ESP32’s transmitting concurrently, they could interfere with one another and cause packet loss.

 

Belle’s Status Report for 11/2

This week, I mainly focused on cleaning up the code that I wrote last week.

Essentially, its purpose is to make a location prediction for each frame from the camera/video feed (0-3 if in range of a corresponding drum, and -1 otherwise) and store it in-order in a buffer with a fixed capacity of 20. I demoed this portion of the code with the sample moving red dot video I made a couple of weeks ago, and it appeared to work fine, with minimal impact to the over frame-by-frame computer vision calculation latency (it remained at ~1.4ms). Given that the prediction function has a worst-case O(1) time (and space) complexity, this was expected.

However, the issue lies with the function that calculates the moving average of the buffer. 

As mentioned in my previous post, the drumstick tip location result for each frame is initially put into the buffer at index bufIndex, which is a global variable updated using the formula bufIndex = (bufIndex + 1) % bufSize, maintaining the circular aspect of the buffer. Then, the aforementioned function calculates the exponentially weighted moving average of the most recent 20 camera/video frames. 

However, during this calculation the buffer is still being modified continuously since it is a global variable, so the most recent frames could very likely change mid-function and potentially skew the result. Therefore, it would be best to protect this buffer somehow: using either mutexes or copying. Though using a lock/mutex is one of the more intuitive options, it would likely not work for our purposes. As previously mentioned, we still need to modify the buffer to keep it updated for other consecutive drum hits/accelerometer spikes, so we would not be able to do this while the moving average calculation function has the lock on the buffer. There is also the option of combining boolean variables and an external buffer such that we read and write to only one (respectively), depending on whether the moving average is being calculated or not. However, I feel as though this needlessly complicates the process, and it would be simpler to instead make a copy of the buffer inside of the function and read from it accordingly.

Since the computer vision code is somewhat finished, I believe we are on track. Next week, since we just got the camera, I hope to actual begin testing my code with the drumsticks and determine actual hsv color ranges to detect the drumstick tips.

Team Status Report for 11/2

This week we made significant strides towards the completion of our project. Namely, we got the audio playback system to have very low latency and were able to get the BLE transmission to both work and have much lower latency. We think a a significant reason for why we were measuring so much latency earlier in HH 13xx was because many other project groups were using the same bandwidth and thus causing throughput to be much lower. Now, when testing at home, we see that the BLE transmission seems nearly instantaneous. Similarly, the audio playback module now operates with very low latency. This required a shift from using sounddevice to pyAudio and audio streams. Between these two improvements, our main bottleneck for latency will likely be storing frames in our frame buffer and continually doing object detection throughout the playing session.

This brings me to the design change we are now implementing. Previously we had planned to only do object detection to locate where the tips of the drum sticks are when an impact occurs; we’d read the impact and the trigger the object detection function to determine which drum ring the impact occurred in from the 20 most recent frames. However we now plan to continuously keep track of the location of the tips as the user plays, storing the (x, y) location in a sliding window buffer. Then, when an impact occurs, we will immediately already have the (x, y) locations of the tips for every frame in recent time, and thus be able to omit the object detection prior to playback, and instead simply apply our exponential weighing algorithm to the stored locations.

This however brings us to our greatest risk: high latency for continuous object detection. We have not yet tested a system that continuously tracks and stores the location of the drum stick tips. Thus, we can’t be certain of what the latency will look like for this new design. Additionally, since we haven’t tested an integrated system yet, we also don’t know if even though the individual components seems to have good latency, the entire system will, given the multiple synchronizations and data processing modules that need to interact.

Thus, a big focus in the coming weeks will be to incrementally test the latency’s of partially integrated systems. First, we want to connect the BLE module to the audio playback module so we can assess how much latency there is without the object detection involved. Then, once we optimize that, we’ll connect and test the whole system including the continual tracking of the tips of the drum sticks. Hopefully, by doing this modularly, we can more clearly see what components are introducing the most latency and focus on bringing those down prior to testing the integrated system.

As of now, our schedule has not changed and we seem to be moving at a good pace. In the coming week we hope to make significant progress on the object tracking module as well as test a partially integrated system with the BLE code and the audio playback code. This would be pretty exciting since this would actually involve using drumsticks and hitting the surface to trigger a sound, which is fairly close to what the final product will do.

Ben Solo’s Status Report for 11/2

This week I spent my time working on optimizing the audio playback module. At the start of the week my module had about 90ms of latency fir every given sound that needed to be played. In a worst case situation, we could work with this, but since we want an overall system latency below 100ms, it was clearly suboptimal. I went through probably 10 iterations before I landed on the current implementation which utilized pyAudio as the sound interface and has what feels like instantaneous playback. I’ll explain the details of what I changed/implemented below and discuss a few of the previous iterations I went through before landing on this final one.
The first step was to create a system that allowed me to both test playing individually triggered sounds via keyboard input while not disrupting the logic of the main controller I explained in my last status report. To do this, I implemented a testing mode. When run with testing=True, the controller takes keyboard inputs w, a, s, d to trigger each of the 4 sounds as opposed to the simulated operating scheme where the loop continually generates random simulated accelerometer impacts and subsequently returns a number in the range [1,4]. This allows me not only to test the latency for individual impacts, but also what the system would operate like when multiple impacts occur in rapid succession.
Having implemented this new testing setup, I now needed to revise the actual playback function responsible for playing a specific sound when triggered. The implementation from last week worked as follows:
1.) at the start of the session, pre-load the sounds so that the data can easily be referenced and played
2.) when an impact occurs, spawn a new thread that handles the playback of that one sound using the sound device library.
The code for the actual playback function looked as follows:

def playDrumSound(index)   
   if index in drumSounds:
        data, fs = drumSounds[index]
        dataSize = len(data)
        print(f'playing sound {index}')
        if dataSize < 6090:
            blockSize = 4096
        elif dataSize < 10000:
            blockSize = 1024
        else:
            blockSize = 256
        with playLock:
            sd.play(data, samplerate=fs, device=wasapiIndex, blocksize=blockSize)

This system was very latent, despite the use of the WASAPI device native to my laptop. Subsequent iterations of the function included utilizing a queue, where each time an impact was detected, it was added ton the queue and played whenever the system could first get to it., This was however a poor idea since this introduces unpredictability into when the sound actually plays, which we can’t have given playing the drums is very rhythm heavy> Another idea I implemented but eventually discarded after testing was to use streamed audio. In this implementation, I spawned a thread for each detected impact which would then write the contents of the sound file to an output stream and play it. However, for reasons still unknown to me (I think it was due to how I was cutting the sound data and loading it into the stream), this implementation was not only just as latent, but also massively distorted the sounds when played.
A major part of the issue was that between the delay inherent in playing a sound (simply the amount of time it takes for the sound to play) and the latency associated with playing the sounds, it was nearly impossible to create an actual rhythm like you would see when playing a drum set. My final implementation, which used pyAudio avoids all these issues by cutting down the playback latency so massively that it almost feels instantaneous. The trick here was a combination of many of the other implementations I had tried out. This is how it works:
1.) at the start of the session we preload each of the sounds so the data and parameters (number of channels, sampling rate, sample width, etc.) were all easily accessible at run time. Additionally, we initialize an audio stream for each of the 4 sounds, so they can each play independent from the other sounds.
2.) during the session, once and impact is detected (a keypress in my case), and the index of the sound to play has been determined, I simply retrieve the sound from our preloaded sounds as well as the associated sounds open audio stream. I then write the frames of the audio to the stream.
This results in near instantaneous playback. The code for this (both preloading and playback) is shown below:

def preload_sounds():
    for index, path in soundFiles.items():
        with wave.open(path, 'rb') as wf:
            frames = wf.readframes(wf.getnframes())
            params = wf.getparams()
            drumSounds[index] = (frames, params)
            soundStreams[index] = pyaudio_instance.open(
                format=pyaudio_instance.get_format_from_width(params.sampwidth),
                channels=params.nchannels,
                rate=params.framerate,
                output=True,
                frames_per_buffer=256
            )

def playDrumSound(index):
    if index in drumSounds:
        frames, _ = drumSounds[index]
        stream = soundStreams[index]
        stream.write(frames, exception_on_underflow=False)

Though this took a lot of time to come to, I think it was absolutely worth it. We now no longer need to worry that the playback of audio will constrain us from meeting our 100ms latency requirement, and can instead focus on the object detection modules and Bluetooth transmission latency. For reference, I attached a sample of how the playback may occur here.

My progress is on schedule this week. In the following week the main goal will be to integrate Elliot’s Bluetooth code, which also reached a good point this week into the main controller so we can actually start triggering sounds via real drum stick impacts as opposed to key board events. If that gets done, I’d like to test the code I wrote last week for detecting the (x, y, r) of the 4 rubber rings in real life, now that we have our webcam. This will probably require me to make some adjustments to the parameters of the hough_circles function we are using to identify them.

Elliot’s Status Report for 11/2

I spent this week cleaning up the system’s Bluetooth module, determining the one-way latency of our wireless data transmission, and establishing a consistent threshold for the incoming accelerometer values on the host device.

To obtain latency metrics, I chose to implement a Round Trip Time (RTT) test. The strategy was to take an initial timestamp on the ESP with the system clock, update the server characteristic and notify the client, wait for a response by observing a change in the server entry, and take the time difference. This came with a few minor issues to be resolved: first, I observed that the characteristic updates were inconsistent and the test resulted in significantly different output values across runs. This was due to the client updating the same buffer as the ESP32 during its response, thus introducing concurrency issues when the devices attempted to update the characteristic simultaneously. I fixed this by separating the transmission and reception to two distinct characteristics, allowing for continuous processing on both sides. Once this was resolved, I noticed that the resulting delay was still too high–around 100ms. After searching online, I came across this article, stating that the default connection interval for the ESP32 ranges from 7.5ms up to as much as 4s: https://docs.espressif.com/projects/esp-idf/en/release-v5.2/esp32c6/api-guides/ble/get-started/ble-connection.html. Having this variance was unacceptable for our purposes, and so I made use of the esp_gap_ble_api library to manually set the maximum connection interval to 20ms. This change greatly reduced the final delay of the test, but having the shorter connection interval means I’ll have to be aware of interference as we integrate a second microcontroller on the 2.4GHz band. The final value of my testing procedure landed our one-way latency at around 40ms, but my belief is that the actual value is even less; this is because of the inherent overhead introduced across the testing code–the operations of looping in the arduino firmware, polling for the client response, and unpacking data all contribute a nonzero latency to the result. Hence, I tested the implementation qualitatively by manually setting a fixed accelerometer threshold and printing over USB on valid spikes. This test produced favorable results, suggesting that the latency could certainly be under 40ms. I was also able to determine an appropriate threshold value for data processing while doing this, which I concluded to be 10 m/s2. This value achieved a reasonable hit detection rate, but we may choose to store multiple thresholds corresponding to different surfaces if the user wishes to play with a uniform actuation force across all surface types. Ultimately, these tests were helpful in our planning towards a low-latency solution, and I believe I’m still on track with the team’s schedule.

In this upcoming week, I plan to move my Bluetooth code into the system controller and assist Ben with audio buffer delay. Specifically, I will:

  1. Create a functional controller to detect accelerometer hits and play specified audio files before introducing CV.
  2. Explore ways to minimize audio output latency as much as possible, such as diving into the PyAudio stack, finding a different library, or considering the MIDI controller route suggested to us by Professor Bain.