Team C0: DrumLite – Page 2 – Carnegie Mellon ECE Capstone, Fall 2024

November 16, 2024January 9, 2025

Team Status Report for 11/16

This week our team focused on implementing the second Bluetooth drumstick as well as a reliable hit detection mechanism. This led to a few changes in our system controller–first, we refactored the callback functions in the BLE module to more accurately characterize the accelerometer readings, and we also modified the ESP threads to measure hits based off changes in acceleration rather than a static threshold. One risk to the system we are currently facing is the increased latency of the drumsticks when operating simultaneously; our mitigation plan is to explore other threading libraries such as multiprocessing or to research additional functionality in the asyncio module to better handle concurrent execution. In regards to the lighting concerns brought up in last week’s status report, we are in the process of testing with additional overhead lights, which appears to be an effective mitigation strategy for ensuring consistent output.

Our team has run a number of tests thus far in the development process: while much of our system testing is qualitative (measuring audio buffer latency being impractical with timestamps), some of our formal testing includes round-trip time for the BLE, accelerometer waveform generation with the MPU6050’s, and CV frame-by-frame delay measurements. Additionally, as we move into a fully functional deliverable this week, we plan to conduct an end-to-end latency measurement between the accelerometer spike and the sound playback, and we will also validate our 30mm use case requirement by ensuring that the perimeters for our each of our drum pads remain sensitive to input, even across varying environments, lighting, and drum configurations.

Figure 1. MPU 6050 hit sequence (prior to z-axis signed magnitude changes)

The results of our tests will be compared directly to the use case and design requirements we set out to fulfill, and our goal of diversity and inclusion will be achieved through rigorous testing to reach a comfortable user interface. We do not have new changes in our schedule to report, and we intend to approach a viable product in the coming weeks.

November 16, 2024January 9, 2025

Ben Solo’s Status Report for 11/16

This week I worked on determining better threshold values for what qualifies as an impact. Prior to this week we had a functional system where the accelerometer ESP32 system would successfully relay x, y, z acceleration data to the user’s laptop, and trigger a sound to play. However, we would essentially treat any acceleration as an impact and thus trigger a sound. We configured our accelerometer to align its Y-axis parallel to the drumstick, the X-axis to be parallel to the floor and perpendicular to the drumstick (left right motion), and the Z-axis to be perpendicular to the floor and perpendicular to the drumstick (up-down motion). This is shown in the image below:

Thus, we have two possible ways to retrieve relevant accelerometer data: 1.) reading just the Z-axis acceleration, and 2.) reading the average of the X and Z axis acceleration, since these are the two axis with relevant motion. So on top of finding better threshold values for what constitutes an impact, I needed to determine what axis to use when reading and storing the acceleration. To determine both of these factors I ran a series of tests where I mounted the accelerometer/ESP32 system to the drumstick as shown in the image above and ran two test sequences. In the first test I used just the Z-axis acceleration values and in the second I used the average of the X and Z-axis acceleration. For each test sequence, I recorded 25 clear impacts on a rubber pad on my table. Before starting the tests, I did preformed a few sample impacts so I could see what the readings for an impact resembled. I noted that when an impact occurs (the stick hits the pad) the acceleration reading for that moment is relatively low. So for the tests purposes, an impact was identifiable by seeing a long chain of constant accelerometer data (just holding the stick in the air), followed by an increase in acceleration, followed by a low acceleration reading.
Once I established this, I started collecting samples, first for the test using just the Z-Axis. I performed a series of hits, recording the 25 readings where I could clearly discern an impact had occurred from the output data stream. Cases where an impact was not clearly identifiable from the data were discarded and that sample was repeated. For each sample, I stored the acceleration at the impact, the acceleration just prior to the impact, and the difference between the two (i.e. A(t-1) – A(t)). I then determined the mean and standard deviation for the acceleration, prior acceleration, and the difference between the two acceleration readings. Additionally, I calculated what I termed the upper bound (mean + 1 StdDev) and the lower bound (mean – 1 StdDev). The values are displayed below:

I repeated the same process for the second test, but this time using the average of the X and Z-axis acceleration. The results for this test sequence are shown below:

As you can see, almost unanimously, the values calculated from just the Z-axis acceleration are higher than when using the X,Z average. To then determine what the best threshold values to use would be I proceeded with 4 more tests:
1.) I set the system to use just the Z-axis acceleration and detected a impact with the following condition:

if accel < 3.151 and accel > 1.072: play sound

Here I was just casing on the acceleration to detect an impact, using the upper and lower bound for Acceleration(t).

2.) I set the system to use just the Z-axis acceleration and detected an impact with the following condition:

if (prior_Accel - Accel) < 5.249 and (prior_Accel - Accel) > 2.105: play sound

Here, I was casing on the difference of the previous acceleration reading and the current acceleration reading, using the upper and lower bounds for Ax(t-1) – Ax(t)

3.) I set the system to use the average of the X and Z-axis accelerations and detected an impact with the following condition:

if accel < 2.401 and accel > 1.031: play sound

Here I was casing on just the current acceleration reading, using the upper and lower bounds for Acceleration(t).

4.) I set the system to use the average of the X and Z-axis accelerations and detected an impact with the following condition:

if (prior_Accel - Accel) < 5.249 and (prior_Accel - Accel) > 2.105: play sound

Here I cased on the difference of the prior acceleration minus the current acceleration and used the upper and lower bounds for Ax(t-1) – Ax(t).

After testing each configuration out, I determined two things:
1.) The thresholds taken using the average of the X and Z-axis accelerations resulted in higher correct impact detection than just using the Z-axis acceleration, regardless of whether casing on the (prior_Accel – Accel) or just Accel.

2.) Using the difference between the previous acceleration reading and the current one resulted in better detection of impacts.

Thus, the thresholds we are now using are defined by the upper and lower bound of the difference between the prior acceleration and the current acceleration (Ax(t-1) – Ax(t)), so the condition listed in test 4.) above.

While this system is now far better at not playing sounds when the user is just moving the drumstick about in the air, it still needs to be further tuned to detect impacts. From my experience testing the system out, it seems that about 75% of the instances when I hit the drum pad correctly register as impacts, while ~25% do not. We will need to conduct further tests as well as some trial and error in order to find better thresholds that more accurately detect impacts.

My progress is on schedule this week. In the coming week, I want to focus on getting better threshold values for the accelerometer so impacts are detected more accurately. Additionally, I want to work on the refinement of the integration of all of our systems. As I said last week, we now have a functional system for one drumstick. However, we recently constructed the second drumstick and need to make sure that the additional three threads that need to run concurrently (One controller thread, one BLE thread, and one CV thread) work correctly and do not interfere with one another. Make sure this process goes smoothly will be a top priority in the coming weeks.

November 9, 2024January 9, 2025

Belle’s Status Report for 11/9

This week, we mainly focused on integrating the different components of our project to prepare for the Interim Demo, which is coming up soon.

We first successfully integrated Elliot’s bluetooth/accelerometer code into the main code. The evidence of said success was an audio response (a drum beat/sound) being triggered by making a hit motion with the accelerometer and ESP32 in-hand.

We then aimed to integrate my drumstick tip detection code, which was a bit more of a challenge. The main issue concerned picking the correct HSV/RGB color values with respect to lighting and the shape of the drumstick tip. We positioned the drumstick tip on the desk (which we colored bright red), in-view of the webcam, and took a screenshot of the output. I then took this image and used a HSV color picker website in order to get HSV values for specific pixels in the screenshot. However, because of its rounded, oval-like shape, we have to consider multiple shadow, highlight, and mid-tone values. Picking a pixel that was too light or too dark would cause the drumstick tip to only be “seen” sometimes, or cause too many things to be “seen”. For example: sometimes the red undertones in my skin would be tracked along with the drumstick tip, or the tip would only be visible when in the more brightly-lit areas of the table.

In order to remedy this issue, we are experimenting with lighting to find an ideal setup. Currently we are using a flexible lamp that clamps onto the desk that the drum pads are laid on, but it only properly illuminates half of the desk. Thus, we put in an order for another lamp so that both halves of the desk can be properly lit, which should make the lighting more consistent.

As per our gantt chart, we are supposed to be configuring accelerometer thresholds and integrating all of our code at the moment, so we are surely on track. Next week, I plan to look into other object/color tracking such as CamShift, Background Subtraction, or even YOLO/SSD systems in case the lighting situation becomes overly complicated. I also would like to work on fine-tuning the accelerometer threshold values, as we are currently just holding the accelerometer and making a hit-like motion rather than strapping it to a drumstick and hitting the table.

November 9, 2024January 9, 2025

Elliot’s Status Report for 11/9

This week was spent working with the rest of the team to bring up a testable prototype for the interim demo. I integrated my bluetooth client code to the system controller in our repository, and together we sorted out file dependencies to get a running program that plays audio upon valid accelerometer spikes. I also worked on planning ahead with the multithreaded code in mind, in which we will need to spawn separate threads for each drumstick–a foreseeable issue in our development is undoubtedly the timing synchronization between the accelerometer readings, computer vision detection, and audio playback, and I plan to meet with Ben and Belle to continue to test their interaction with the shared buffer thoroughly. Once the speed of the system, especially the CV, is confidently established, I may also update the rate at which the ESP boards notify the laptop with new readings, or even switch to a polling-based implementation.

The other potential concern is the interference on the 2.4GHz band once the second microcontroller is incorporated. In our weekly meetings with Tjun Jet and Professor Bain, we considered utilizing the Wi-Fi capabilities of the ESP32 rather than BLE to ensure adequate throughput and connectivity. With our testing this week, however, it seems that Bluetooth offers an appropriate latency for the needs of the project, and so our reason for choosing Wi-Fi would depend solely on the packet loss behavior of the two sticks running together. There could also be tradeoffs if we choose to redirect the system away from BLE in the form of setup time and ease of pairing, which would potentially neglect our use case requirement for versatility. As such, my plan for this upcoming week is to integrate the second ESP32 board to begin testing with two BLE devices, and to conduct a trade study between Bluetooth and Wi-Fi for our specific use case. I believe the team is on schedule and providing ample time for testing, therefore allowing us to identify important considerations–such as lighting–earlier in the process.

November 9, 2024January 9, 2025

Team Status Report for 11/9

This week we made big strides towards the completion of our project. We incrementally combined the various components each of us had built into one unified system that operates as defined in our use case for 1 drum stick. Essentially, we have a system where when the drumstick hits a given pad, it triggers an impact event and plays the sound corresponding to that drum pad with very low latency. Note however that this is currently only implemented for 1 drum stick, and not both. That will be our coming week’s goal. The biggest risk we identified which we had not anticipated was how much variation in lighting affect the ability of the object tracking module to identify the red colored drum stick tip. By trying out different light intensities (no light, overhead beam light, phone lights, etc.) we determined that without consistent lighting the system would not operate. During our testing, every time the light changed, we would have to capture an image of the drum stick tip, find its corresponding HSV value, and update the filter in our code before actually trying the system out. If we are unable to find a way to provide consistent lighting given any amount of ambient lighting, this will severely impact how usable this project is. The current plan is to purchase two very bright clip on lamps that can be oriented and positioned to equally distribute light over all 4 drum rings. If this doesn’t work, our backup plan is to line each drum pad with LED strips so each has consistent light regardless of its position relative to the camera. The backup is less favorable because it would require that either batteries be attached to each drum pad, or that each drum pad must be close enough to an outlet to be plugged in, which deteriorates our portability and versatility goal defined in the use case.

The second risk we identified was the potential for packet interference when transmitting from two ESP32’s simultaneously. There is a chance that when we try and use two drumsticks, both transmitting accelerometer data simultaneously, the transmissions will interfere with one another resulting in packet loss. The backup plan for this is to switch to WIFI, but this would require serious overhead work to implement. Our hope is that since most of the time impacts from two drum sticks occur sequentially, the two shouldn’t interfere, but we’ll have to see what the actual operation is like this week to be sure.

The following are some basic changes to the design of DrumLite we made this week:
1.) We are no longer using rubber rings and instead using circular rubber pads. The reason for this is as follows. When we detect the drum pad’s locations and radii and use rings, there are two circles that could potentially bet detected: 1 being the outer circle and one being the inner circle. Since the ins no good way to tell the system which one to choose, we decided to switch to a drum pad instead where only 1 circle can ever be detected. Additionally, this also makes getting a threshold acceleration much easier since the surface being hit will now be uniform. This system works very well and the detection output is shown below:

2.) We decided to continually record the predicted drum ring which the drum stick is in throughout the playing session. This way, when an impact occurs, we don’t actually have to do any CV and can instead just perform the exponential weighting on the predicted drum rings to determine which pad was hit.

We are on schedule and hope to continue at this healthy pace throughout the rest of the semester. Below is an image of the whole setup so far:

November 9, 2024January 9, 2025

Ben Solo’s Status Report for 11/9

This week myself and the rest of the group spent basically all of our time integrating each of the components we’ve been working on into one unified system. Aside from the integration work, I made a few changes to the controller handles two drumsticks as opposed to 1, altered the way we handle detecting the drum pads at the start of the session, and cut out the actual rubber drum pads to their specific diameters for testing. Prior to this week, we had the following separate components:
1.) A BLE system capable of transmitting accelerometer data to the paired laptop
2.) A dedicated CV module for detecting the drum rings at the start of the playing session. This function was triggered by clicking a button on the webapp, which used an API to initiate the detection process.
3.) A CV module responsible for continually tracking the tip of a drumstick and storing the predicted drum pad the tip was on for the 20 most recent frames.
4.) An audio playback module responsible for quickly playing audios on detected impacts.

We split our integration process into two steps; the first was to connect the BLE/accelerometer code to the audio playback module, omitting the object tracking. To do this, Elliot had to change some of the BLE module so it could successfully be used in our system controller, and I needed to change the way we were previously reading in accelerometer data in the system controller. I was under the impression that the accelerometer/ESP32 system would continuously transmit the accelerometer data, regardless of whether any acceleration was occurring (i.e. transmit 0 acceleration if not accelerating). However in reality, the system only sends data when acceleration is detected. Thus, I changed the system controller to read a globally set acceleration variable from the Bluetooth module every iteration of the while loop, and then compare this to the predetermined acceleration threshold to decide whether an impact has occurred of not. After Elliot and I completed the necessary changes for integration, we tested the partially integrated system by swinging the accelerometer around to trigger an impact event, then assigning a random index [1,4] (since we hadn’t integrated the object tracking module yet), and playing the according sound. The system functioned very well with surprisingly low latency.

The second step in the integration process was to combine the partially integrated accelerometer/BLE/Playback system with the object tracking code. This again required me to change how the system controller worked. Because Belle’s code needs to independently run continuously and populate our 20 frame buffer of predicted drum pads, we needed a new thread for each drum stick that starts as soon as the session begins. The object tracking code treated drum pad metadata as an array of length 4 of tuples in the form (x, y, r). I was storing drum pad meta data (x, y, r) in a dictionary where each value was associated with a key. Thus, I changed the way we this information to coincide with Belle’s code. At this point, we combined all the logic needed for 1 drumstick’s operation and proceeded to testing. Though obviously it didn’t work on the first try, after a few further modifications and changes, we were successful in producing a system the tracks the drumstick’s location, transmits accelerometer data to the laptop, and plays the corresponding sound of a drum pad when an impact occurs. This was a huge step in our projects progression, as we have a basic, working version of what we proposed to do, all while maintaining a low latency (measuring exactly what the latency is was difficult since its sound based, but just from using it, its clear that the current latency is far below 100ms).

Outside of this integration process, I also started o think about and work on how we would handle two drumsticks as opposed to 1, which we already had working. The key realization were that we need to CV threads to continuously and independently track the location of each drum stick. We would also need two BLE threads, one for each drum sticks acceleration transmission. Lastly, we would need two threads running the system controller code which handles reading in acceleration data, identifying what drum pad the stick was in during an impact, and triggering the audio playback. Though we haven’t yet tested the system with two drum sticks, the system controller is now set up so that once we do want to test it, we can easily spawn corresponding threads for the second drum stick. This involved re-writing the functions to case on the color of each drum sticks tip. This is primarily needed because the object tracking module needs to know which drum stick to track, but is also used in the BLE code to store acceleration data for each stick independently.

Lastly, I spent some time carefully cutting out the drum pads from the rubber sheets at the diameters 15.23, 17.78, 20.32, 22.86 (cm) so we could proceed with testing. Below is an image of the whole setup including the camera stand, webcam, drum pads, and drum sticks.

We are definitely on schedule and hope to continue progressing at this rate for the next few weeks. Next week, I’d like to do two things: 1.) I want to refine the overall system, making sure we have accurate acceleration thresholds and assigning the correct sounds to the correct drum pads from the webapp, and 2.) testing the system with two drum sticks at once. The only worry we have is that since we’ll have two ESP32’s transmitting concurrently, they could interfere with one another and cause packet loss.

November 2, 2024January 9, 2025

Belle’s Status Report for 11/2

This week, I mainly focused on cleaning up the code that I wrote last week.

Essentially, its purpose is to make a location prediction for each frame from the camera/video feed (0-3 if in range of a corresponding drum, and -1 otherwise) and store it in-order in a buffer with a fixed capacity of 20. I demoed this portion of the code with the sample moving red dot video I made a couple of weeks ago, and it appeared to work fine, with minimal impact to the over frame-by-frame computer vision calculation latency (it remained at ~1.4ms). Given that the prediction function has a worst-case O(1) time (and space) complexity, this was expected.

However, the issue lies with the function that calculates the moving average of the buffer.

As mentioned in my previous post, the drumstick tip location result for each frame is initially put into the buffer at index bufIndex, which is a global variable updated using the formula bufIndex = (bufIndex + 1) % bufSize, maintaining the circular aspect of the buffer. Then, the aforementioned function calculates the exponentially weighted moving average of the most recent 20 camera/video frames.

However, during this calculation the buffer is still being modified continuously since it is a global variable, so the most recent frames could very likely change mid-function and potentially skew the result. Therefore, it would be best to protect this buffer somehow: using either mutexes or copying. Though using a lock/mutex is one of the more intuitive options, it would likely not work for our purposes. As previously mentioned, we still need to modify the buffer to keep it updated for other consecutive drum hits/accelerometer spikes, so we would not be able to do this while the moving average calculation function has the lock on the buffer. There is also the option of combining boolean variables and an external buffer such that we read and write to only one (respectively), depending on whether the moving average is being calculated or not. However, I feel as though this needlessly complicates the process, and it would be simpler to instead make a copy of the buffer inside of the function and read from it accordingly.

Since the computer vision code is somewhat finished, I believe we are on track. Next week, since we just got the camera, I hope to actual begin testing my code with the drumsticks and determine actual hsv color ranges to detect the drumstick tips.

November 2, 2024January 9, 2025

Team Status Report for 11/2

This week we made significant strides towards the completion of our project. Namely, we got the audio playback system to have very low latency and were able to get the BLE transmission to both work and have much lower latency. We think a a significant reason for why we were measuring so much latency earlier in HH 13xx was because many other project groups were using the same bandwidth and thus causing throughput to be much lower. Now, when testing at home, we see that the BLE transmission seems nearly instantaneous. Similarly, the audio playback module now operates with very low latency. This required a shift from using sounddevice to pyAudio and audio streams. Between these two improvements, our main bottleneck for latency will likely be storing frames in our frame buffer and continually doing object detection throughout the playing session.

This brings me to the design change we are now implementing. Previously we had planned to only do object detection to locate where the tips of the drum sticks are when an impact occurs; we’d read the impact and the trigger the object detection function to determine which drum ring the impact occurred in from the 20 most recent frames. However we now plan to continuously keep track of the location of the tips as the user plays, storing the (x, y) location in a sliding window buffer. Then, when an impact occurs, we will immediately already have the (x, y) locations of the tips for every frame in recent time, and thus be able to omit the object detection prior to playback, and instead simply apply our exponential weighing algorithm to the stored locations.

This however brings us to our greatest risk: high latency for continuous object detection. We have not yet tested a system that continuously tracks and stores the location of the drum stick tips. Thus, we can’t be certain of what the latency will look like for this new design. Additionally, since we haven’t tested an integrated system yet, we also don’t know if even though the individual components seems to have good latency, the entire system will, given the multiple synchronizations and data processing modules that need to interact.

Thus, a big focus in the coming weeks will be to incrementally test the latency’s of partially integrated systems. First, we want to connect the BLE module to the audio playback module so we can assess how much latency there is without the object detection involved. Then, once we optimize that, we’ll connect and test the whole system including the continual tracking of the tips of the drum sticks. Hopefully, by doing this modularly, we can more clearly see what components are introducing the most latency and focus on bringing those down prior to testing the integrated system.

As of now, our schedule has not changed and we seem to be moving at a good pace. In the coming week we hope to make significant progress on the object tracking module as well as test a partially integrated system with the BLE code and the audio playback code. This would be pretty exciting since this would actually involve using drumsticks and hitting the surface to trigger a sound, which is fairly close to what the final product will do.

November 2, 2024January 9, 2025

Ben Solo’s Status Report for 11/2

This week I spent my time working on optimizing the audio playback module. At the start of the week my module had about 90ms of latency fir every given sound that needed to be played. In a worst case situation, we could work with this, but since we want an overall system latency below 100ms, it was clearly suboptimal. I went through probably 10 iterations before I landed on the current implementation which utilized pyAudio as the sound interface and has what feels like instantaneous playback. I’ll explain the details of what I changed/implemented below and discuss a few of the previous iterations I went through before landing on this final one.
The first step was to create a system that allowed me to both test playing individually triggered sounds via keyboard input while not disrupting the logic of the main controller I explained in my last status report. To do this, I implemented a testing mode. When run with testing=True, the controller takes keyboard inputs w, a, s, d to trigger each of the 4 sounds as opposed to the simulated operating scheme where the loop continually generates random simulated accelerometer impacts and subsequently returns a number in the range [1,4]. This allows me not only to test the latency for individual impacts, but also what the system would operate like when multiple impacts occur in rapid succession.
Having implemented this new testing setup, I now needed to revise the actual playback function responsible for playing a specific sound when triggered. The implementation from last week worked as follows:
1.) at the start of the session, pre-load the sounds so that the data can easily be referenced and played
2.) when an impact occurs, spawn a new thread that handles the playback of that one sound using the sound device library.
The code for the actual playback function looked as follows:

def playDrumSound(index)   
   if index in drumSounds:
        data, fs = drumSounds[index]
        dataSize = len(data)
        print(f'playing sound {index}')
        if dataSize < 6090:
            blockSize = 4096
        elif dataSize < 10000:
            blockSize = 1024
        else:
            blockSize = 256
        with playLock:
            sd.play(data, samplerate=fs, device=wasapiIndex, blocksize=blockSize)

This system was very latent, despite the use of the WASAPI device native to my laptop. Subsequent iterations of the function included utilizing a queue, where each time an impact was detected, it was added ton the queue and played whenever the system could first get to it., This was however a poor idea since this introduces unpredictability into when the sound actually plays, which we can’t have given playing the drums is very rhythm heavy> Another idea I implemented but eventually discarded after testing was to use streamed audio. In this implementation, I spawned a thread for each detected impact which would then write the contents of the sound file to an output stream and play it. However, for reasons still unknown to me (I think it was due to how I was cutting the sound data and loading it into the stream), this implementation was not only just as latent, but also massively distorted the sounds when played.
A major part of the issue was that between the delay inherent in playing a sound (simply the amount of time it takes for the sound to play) and the latency associated with playing the sounds, it was nearly impossible to create an actual rhythm like you would see when playing a drum set. My final implementation, which used pyAudio avoids all these issues by cutting down the playback latency so massively that it almost feels instantaneous. The trick here was a combination of many of the other implementations I had tried out. This is how it works:
1.) at the start of the session we preload each of the sounds so the data and parameters (number of channels, sampling rate, sample width, etc.) were all easily accessible at run time. Additionally, we initialize an audio stream for each of the 4 sounds, so they can each play independent from the other sounds.
2.) during the session, once and impact is detected (a keypress in my case), and the index of the sound to play has been determined, I simply retrieve the sound from our preloaded sounds as well as the associated sounds open audio stream. I then write the frames of the audio to the stream.
This results in near instantaneous playback. The code for this (both preloading and playback) is shown below:

def preload_sounds():
    for index, path in soundFiles.items():
        with wave.open(path, 'rb') as wf:
            frames = wf.readframes(wf.getnframes())
            params = wf.getparams()
            drumSounds[index] = (frames, params)
            soundStreams[index] = pyaudio_instance.open(
                format=pyaudio_instance.get_format_from_width(params.sampwidth),
                channels=params.nchannels,
                rate=params.framerate,
                output=True,
                frames_per_buffer=256
            )

def playDrumSound(index):
    if index in drumSounds:
        frames, _ = drumSounds[index]
        stream = soundStreams[index]
        stream.write(frames, exception_on_underflow=False)

Though this took a lot of time to come to, I think it was absolutely worth it. We now no longer need to worry that the playback of audio will constrain us from meeting our 100ms latency requirement, and can instead focus on the object detection modules and Bluetooth transmission latency. For reference, I attached a sample of how the playback may occur here.

My progress is on schedule this week. In the following week the main goal will be to integrate Elliot’s Bluetooth code, which also reached a good point this week into the main controller so we can actually start triggering sounds via real drum stick impacts as opposed to key board events. If that gets done, I’d like to test the code I wrote last week for detecting the (x, y, r) of the 4 rubber rings in real life, now that we have our webcam. This will probably require me to make some adjustments to the parameters of the hough_circles function we are using to identify them.

November 2, 2024January 9, 2025

Elliot’s Status Report for 11/2

I spent this week cleaning up the system’s Bluetooth module, determining the one-way latency of our wireless data transmission, and establishing a consistent threshold for the incoming accelerometer values on the host device.

To obtain latency metrics, I chose to implement a Round Trip Time (RTT) test. The strategy was to take an initial timestamp on the ESP with the system clock, update the server characteristic and notify the client, wait for a response by observing a change in the server entry, and take the time difference. This came with a few minor issues to be resolved: first, I observed that the characteristic updates were inconsistent and the test resulted in significantly different output values across runs. This was due to the client updating the same buffer as the ESP32 during its response, thus introducing concurrency issues when the devices attempted to update the characteristic simultaneously. I fixed this by separating the transmission and reception to two distinct characteristics, allowing for continuous processing on both sides. Once this was resolved, I noticed that the resulting delay was still too high–around 100ms. After searching online, I came across this article, stating that the default connection interval for the ESP32 ranges from 7.5ms up to as much as 4s: https://docs.espressif.com/projects/esp-idf/en/release-v5.2/esp32c6/api-guides/ble/get-started/ble-connection.html. Having this variance was unacceptable for our purposes, and so I made use of the esp_gap_ble_api library to manually set the maximum connection interval to 20ms. This change greatly reduced the final delay of the test, but having the shorter connection interval means I’ll have to be aware of interference as we integrate a second microcontroller on the 2.4GHz band. The final value of my testing procedure landed our one-way latency at around 40ms, but my belief is that the actual value is even less; this is because of the inherent overhead introduced across the testing code–the operations of looping in the arduino firmware, polling for the client response, and unpacking data all contribute a nonzero latency to the result. Hence, I tested the implementation qualitatively by manually setting a fixed accelerometer threshold and printing over USB on valid spikes. This test produced favorable results, suggesting that the latency could certainly be under 40ms. I was also able to determine an appropriate threshold value for data processing while doing this, which I concluded to be 10 m/s². This value achieved a reasonable hit detection rate, but we may choose to store multiple thresholds corresponding to different surfaces if the user wishes to play with a uniform actuation force across all surface types. Ultimately, these tests were helpful in our planning towards a low-latency solution, and I believe I’m still on track with the team’s schedule.

In this upcoming week, I plan to move my Bluetooth code into the system controller and assist Ben with audio buffer delay. Specifically, I will:

Create a functional controller to detect accelerometer hits and play specified audio files before introducing CV.
Explore ways to minimize audio output latency as much as possible, such as diving into the PyAudio stack, finding a different library, or considering the MIDI controller route suggested to us by Professor Bain.