Ben Solo’s Status Reports – Team C0: DrumLite

December 6, 2024January 9, 2025

Ben Solo’s Status Report for 12/7

Following our final presentation on Monday, I spent this week working on integrations, making minor fixes to the system all around, and working with Elliot to resolve out lag issues when using two drumsticks. I also updated our frontend to reflect the actual way our system is intended to be used. Lastly, I made a modification to the object detection code that prevents swinging motions outside of the drum pads from triggering an audio event. I’ll go over my contributions this week individually below:
1.) I implemented a sorting system the correctly maps the detected drum pads to the corresponding sound via the radius detected. The webapp allows the user to configure their sounds based on the diameter of the drum pads, so its important that when the drum pads are detected at the start of playing session, they are correctly assigned to loaded sounds, instead of being randomly paired based on the order they are encountered. This entailed writing a helper function that scales and orders the drums based on their diameters, organizing them in order of ascending radius.

2.) Elliot and I spent a considerable amount of time on Wednesday trying to resolve our two stick lag issues. He had written some new code to be flashed to the drumsticks that should have eliminated the lad by fixing an error we had initially coded where we were at time sending more data than the window could handle. E.g. we were trying to send readings every millisecond but only had a window of 20ms which resulted in received notifications being queued and processed after some delay. We had some luck with this new implementation, but after testing the system multiple times, we realized that the performance seemed to be deteriorating with time. We are now under the impression that this is a result of diminishing power in our battery cells, and are now planning on using fresh batteries to see if that resolves our issue. Furthermore, during these tests I realized that our range for what constitutes an impact was too narrow, which often times resulted in a perceived lag because audio wasn’t being triggered following an impact. To resolve this, we tested with a few new values, settling on a range of 0.3 m/s^2 to 20m/s^2 as a valid impact.

3.) Our frontend had a bunch of relics from our initial design which either needed to be removed or reworked. The main two issues were that the drum pads were being differentiated via color as opposed to radius, and that we still had a button saying “identify my drum set” on the webapp. The first issue was resolved by changing the radii of the circles representing the drum pads on the website and ordering them in a way that corresponds with how the API endpoint orders sounds and detected drum pads at the start of the session. The second issue regarding the “identify drum set” button was easily resolved by removing the button that triggers the endpoint for starting the drum pad detection script. The code housed in this endpoint is still used, but instead of being triggered via the webapp, we set our system up to run the script as soon as a playing session starts. I thought this design made more sense and made the experience of using our system much more simple by eliminating the need to switch back and fourth between the controller system and the webapp during a playing session. Below is the current updated frontend design:

4.) As it was prior to this week, our object tracking system had a major flaw which had previously gone unnoticed: when the script identified that the drum stick tip was not within the bounds of one of the drum rings, it simple continued to the next iteration of the frame processing loop, not updating our green/blue drum stick location prediction variables. This resulted in the following behavior.:
a.) impact occurs in drum pad 3.
b.) prediction variable is updated to index 3 and sound for drum pad 3 plays.
c.) impact occurs outside of any drum pad.
d.) prediction variable is not updated with a -1 value and thus plays the last know drum pad’s corresponding sound, i.e. sound 3
This is an obvious flaw that causes the system to register invalid impacts as valid impacts and play the previously triggered sound. To remedy this, I changed the CV script to update the prediction variables with the -1 (not in a drum pad) value, and updated the sound player module to only play a sound if the index provided ins in the range [0,3] (i.e. one of the valid drum pad indices). Our system now only plays a sound when the user hits one of the drum pads, playing nothing is the drum stick is swung our hit outside of a drum pad.

I also started working on our poster, which needs to be ready a little earlier than anticipated given our invitation to the Tech Spark Expo. This entails collecting all our graphs/test metrics and figuring out how to present then in a way that conveys meaning to the audience given our use case. For instance, I needed to figure out how to explain that a graph showing 20 accelerometer spikes verifies our BLE reliability within a 3m radius of our receiving laptop.

Overall, I am on schedule and plan on spending the coming weekend/week refining our system and working on the poster, video, and final report. I expect to have a final version of the poster done by Monday so that we can get it printed in time for the Expo on Wednesday.

December 1, 2024January 9, 2025

Ben Solo’s Status Report for 11/30

Over the last week(s), I’ve focused my time on both creating a tool for dynamically selecting/tuning precise HSV values and conducting tests on our existing system in anticipation of the final presentation and the final report. My progress is still on schedule with the Gant Chart we initially set for ourselves. Below I will discuss the HSV tuning tool and tests I conducted in more detail.
As we were developing our system and testing the integration of our CV module, BLE module, and audio playback module, we quickly realized that the approach of finding uniform and consistent lighting was pretty much unattainable without massively constraining the dimensions to which the drum set could be scaled. In other words, if we used a bunch of fixed lights to try and keep lighting consistent, it would a.) affect how portable the drum set is and b.) limit how big you could make the drum set as a result of a loss in uniform lighting the further the drum pads move from the light source. Because the lighting massively impacts the HSV values used to filter and detect the drum stick tips in each frame, I decided we needed a tool that allows the user to dynamically change the HSV values at the start of the session so that a correct range can be chosen for each unique lighting environment. When our system controller (which starts and monitors the BLE, CV, and audio threads) is started, it initially detects the rings locations, scales their radii up by 15mm and shows the result to the user so they can verify the rings were correctly detected. Thereafter, two tuning scripts start, one for blue and one for green. In each case, two windows pop up on the user’s screen, one with 6 sliding toolbar selectors and another with a live video feed from the webcam with the applied blue or green mask over it. In an ideal world, when the corresponding blue or green drumstick is held under the camera, only the colored tip of the drumstick should be highlighted. However, since lighting changes a lot, the user now has the ability to alter the HSV range in real time and see how this affects the filtering. Once they find a range that accurately detects the drum stick tips and nothing else, the user hits enter to close the tuning window and save those HSV values for the playing session. Below is a gif of the filtered live feed window. It shows how initially the HSV range is not tuned precisely enough to detect just the tip of the drumstick, but how eventually when the correct range is selected, only the moving tip of the drumstick is highlighted.

Following building this tool, I conducted a series of verification and validation tests on parts of the system which are outlined below:

1.) To test that the correct sound was being triggered 90% of the time I conducted the following tests. I ran our whole system and and played drums 1, 2, 3, 4 in that order repeatedly for 12 impacts at a time, eventually collecting 100 samples. For each impact, I recorded what sound was played. I then found the percentage of impacts for which the drum pad hit corresponded correctly to the sound played and found this value to be 89%.

2.) To verify that the overall system latency was below 100ms, I recorded a video of myself hitting various drum pads repeatedly. I then loaded the video into a video editor and split the audio from the video. I could then identify the time at which the impact occurred by analyzing the video and identify when the audio was played by finding the corresponding spike in the audio. I then recorded the difference between impact time and playback time for each of the impacts and found an average overall system latency of 94ms. While this is below the threshold we set out for, most impacts actually have a far lower latency. The data was skewed by one recording which had ~360ms of latency.

3.) To verify that our CV module was running in less than 60ms per frame, I used matplotlib to graph the processing time for each frame and found the average value to be 33.2ms per frame. The graph is depicted below.

I conducted several other more trivial tests, such as finding the weight of the drumsticks, the minimum layout dimensions, and verifying that the system can be used reliably within a 3m range of the laptop, all of which yielded expected results as outlined in our use case and design requirements.

In response to the question of what new tools or knowledge I’ve had to learn to progress through our capstone project, Id say that the two technologies I had to learn about and learn how to implement were CV (via openCV and skimage), and audio playback streaming (via pyAudio). I had never worked with either of them before so it definitely took a lot of learning before I was able to implement any strong, working code. For CV, I’d say I probably learned the most from reading other people (especially Belle’s) initial CV code. Her code used all the techniques I needed to use for building my HSV range selecting tool as well as the module I wrote for initially detecting the locations and radii of the drum pads. I read through her code as well as various other forums such as stack overflow whenever I encountered issues and was able to learn all I needed in order to implement both of these modules. In the case of audio playback streaming, I’d say I learned it mostly through trial and error and reading on stack overflow. I probably went through 6 iterations of the playback streaming module before I found a solution with low enough playback speed. Because many other applications such as drum machines or electronic synthesizers encountered many of the same issues as I was when trying to develop an efficient playback module, there was a large amount of information online, whether that be regarding using pyAudio streaming or overall concepts on low latency audio playback (such as preloading audio frames, or initializing audio streams prior to the session)

In the coming week the most pressing goal is to determine why playing with two connected drumsticks at once if resulting in sch a drop in performance. Once we figure out wat this issue is, I hope to spend my time implementing a system that can reliably handle two drumsticks at once. Additionally, I hope to start working on either the poster or video as to alleviate some stress in the coming two weeks before capstone ends.

November 16, 2024January 9, 2025

Ben Solo’s Status Report for 11/16

This week I worked on determining better threshold values for what qualifies as an impact. Prior to this week we had a functional system where the accelerometer ESP32 system would successfully relay x, y, z acceleration data to the user’s laptop, and trigger a sound to play. However, we would essentially treat any acceleration as an impact and thus trigger a sound. We configured our accelerometer to align its Y-axis parallel to the drumstick, the X-axis to be parallel to the floor and perpendicular to the drumstick (left right motion), and the Z-axis to be perpendicular to the floor and perpendicular to the drumstick (up-down motion). This is shown in the image below:

Thus, we have two possible ways to retrieve relevant accelerometer data: 1.) reading just the Z-axis acceleration, and 2.) reading the average of the X and Z axis acceleration, since these are the two axis with relevant motion. So on top of finding better threshold values for what constitutes an impact, I needed to determine what axis to use when reading and storing the acceleration. To determine both of these factors I ran a series of tests where I mounted the accelerometer/ESP32 system to the drumstick as shown in the image above and ran two test sequences. In the first test I used just the Z-axis acceleration values and in the second I used the average of the X and Z-axis acceleration. For each test sequence, I recorded 25 clear impacts on a rubber pad on my table. Before starting the tests, I did preformed a few sample impacts so I could see what the readings for an impact resembled. I noted that when an impact occurs (the stick hits the pad) the acceleration reading for that moment is relatively low. So for the tests purposes, an impact was identifiable by seeing a long chain of constant accelerometer data (just holding the stick in the air), followed by an increase in acceleration, followed by a low acceleration reading.
Once I established this, I started collecting samples, first for the test using just the Z-Axis. I performed a series of hits, recording the 25 readings where I could clearly discern an impact had occurred from the output data stream. Cases where an impact was not clearly identifiable from the data were discarded and that sample was repeated. For each sample, I stored the acceleration at the impact, the acceleration just prior to the impact, and the difference between the two (i.e. A(t-1) – A(t)). I then determined the mean and standard deviation for the acceleration, prior acceleration, and the difference between the two acceleration readings. Additionally, I calculated what I termed the upper bound (mean + 1 StdDev) and the lower bound (mean – 1 StdDev). The values are displayed below:

I repeated the same process for the second test, but this time using the average of the X and Z-axis acceleration. The results for this test sequence are shown below:

As you can see, almost unanimously, the values calculated from just the Z-axis acceleration are higher than when using the X,Z average. To then determine what the best threshold values to use would be I proceeded with 4 more tests:
1.) I set the system to use just the Z-axis acceleration and detected a impact with the following condition:

if accel < 3.151 and accel > 1.072: play sound

Here I was just casing on the acceleration to detect an impact, using the upper and lower bound for Acceleration(t).

2.) I set the system to use just the Z-axis acceleration and detected an impact with the following condition:

if (prior_Accel - Accel) < 5.249 and (prior_Accel - Accel) > 2.105: play sound

Here, I was casing on the difference of the previous acceleration reading and the current acceleration reading, using the upper and lower bounds for Ax(t-1) – Ax(t)

3.) I set the system to use the average of the X and Z-axis accelerations and detected an impact with the following condition:

if accel < 2.401 and accel > 1.031: play sound

Here I was casing on just the current acceleration reading, using the upper and lower bounds for Acceleration(t).

4.) I set the system to use the average of the X and Z-axis accelerations and detected an impact with the following condition:

if (prior_Accel - Accel) < 5.249 and (prior_Accel - Accel) > 2.105: play sound

Here I cased on the difference of the prior acceleration minus the current acceleration and used the upper and lower bounds for Ax(t-1) – Ax(t).

After testing each configuration out, I determined two things:
1.) The thresholds taken using the average of the X and Z-axis accelerations resulted in higher correct impact detection than just using the Z-axis acceleration, regardless of whether casing on the (prior_Accel – Accel) or just Accel.

2.) Using the difference between the previous acceleration reading and the current one resulted in better detection of impacts.

Thus, the thresholds we are now using are defined by the upper and lower bound of the difference between the prior acceleration and the current acceleration (Ax(t-1) – Ax(t)), so the condition listed in test 4.) above.

While this system is now far better at not playing sounds when the user is just moving the drumstick about in the air, it still needs to be further tuned to detect impacts. From my experience testing the system out, it seems that about 75% of the instances when I hit the drum pad correctly register as impacts, while ~25% do not. We will need to conduct further tests as well as some trial and error in order to find better thresholds that more accurately detect impacts.

My progress is on schedule this week. In the coming week, I want to focus on getting better threshold values for the accelerometer so impacts are detected more accurately. Additionally, I want to work on the refinement of the integration of all of our systems. As I said last week, we now have a functional system for one drumstick. However, we recently constructed the second drumstick and need to make sure that the additional three threads that need to run concurrently (One controller thread, one BLE thread, and one CV thread) work correctly and do not interfere with one another. Make sure this process goes smoothly will be a top priority in the coming weeks.

November 9, 2024January 9, 2025

Ben Solo’s Status Report for 11/9

This week myself and the rest of the group spent basically all of our time integrating each of the components we’ve been working on into one unified system. Aside from the integration work, I made a few changes to the controller handles two drumsticks as opposed to 1, altered the way we handle detecting the drum pads at the start of the session, and cut out the actual rubber drum pads to their specific diameters for testing. Prior to this week, we had the following separate components:
1.) A BLE system capable of transmitting accelerometer data to the paired laptop
2.) A dedicated CV module for detecting the drum rings at the start of the playing session. This function was triggered by clicking a button on the webapp, which used an API to initiate the detection process.
3.) A CV module responsible for continually tracking the tip of a drumstick and storing the predicted drum pad the tip was on for the 20 most recent frames.
4.) An audio playback module responsible for quickly playing audios on detected impacts.

We split our integration process into two steps; the first was to connect the BLE/accelerometer code to the audio playback module, omitting the object tracking. To do this, Elliot had to change some of the BLE module so it could successfully be used in our system controller, and I needed to change the way we were previously reading in accelerometer data in the system controller. I was under the impression that the accelerometer/ESP32 system would continuously transmit the accelerometer data, regardless of whether any acceleration was occurring (i.e. transmit 0 acceleration if not accelerating). However in reality, the system only sends data when acceleration is detected. Thus, I changed the system controller to read a globally set acceleration variable from the Bluetooth module every iteration of the while loop, and then compare this to the predetermined acceleration threshold to decide whether an impact has occurred of not. After Elliot and I completed the necessary changes for integration, we tested the partially integrated system by swinging the accelerometer around to trigger an impact event, then assigning a random index [1,4] (since we hadn’t integrated the object tracking module yet), and playing the according sound. The system functioned very well with surprisingly low latency.

The second step in the integration process was to combine the partially integrated accelerometer/BLE/Playback system with the object tracking code. This again required me to change how the system controller worked. Because Belle’s code needs to independently run continuously and populate our 20 frame buffer of predicted drum pads, we needed a new thread for each drum stick that starts as soon as the session begins. The object tracking code treated drum pad metadata as an array of length 4 of tuples in the form (x, y, r). I was storing drum pad meta data (x, y, r) in a dictionary where each value was associated with a key. Thus, I changed the way we this information to coincide with Belle’s code. At this point, we combined all the logic needed for 1 drumstick’s operation and proceeded to testing. Though obviously it didn’t work on the first try, after a few further modifications and changes, we were successful in producing a system the tracks the drumstick’s location, transmits accelerometer data to the laptop, and plays the corresponding sound of a drum pad when an impact occurs. This was a huge step in our projects progression, as we have a basic, working version of what we proposed to do, all while maintaining a low latency (measuring exactly what the latency is was difficult since its sound based, but just from using it, its clear that the current latency is far below 100ms).

Outside of this integration process, I also started o think about and work on how we would handle two drumsticks as opposed to 1, which we already had working. The key realization were that we need to CV threads to continuously and independently track the location of each drum stick. We would also need two BLE threads, one for each drum sticks acceleration transmission. Lastly, we would need two threads running the system controller code which handles reading in acceleration data, identifying what drum pad the stick was in during an impact, and triggering the audio playback. Though we haven’t yet tested the system with two drum sticks, the system controller is now set up so that once we do want to test it, we can easily spawn corresponding threads for the second drum stick. This involved re-writing the functions to case on the color of each drum sticks tip. This is primarily needed because the object tracking module needs to know which drum stick to track, but is also used in the BLE code to store acceleration data for each stick independently.

Lastly, I spent some time carefully cutting out the drum pads from the rubber sheets at the diameters 15.23, 17.78, 20.32, 22.86 (cm) so we could proceed with testing. Below is an image of the whole setup including the camera stand, webcam, drum pads, and drum sticks.

We are definitely on schedule and hope to continue progressing at this rate for the next few weeks. Next week, I’d like to do two things: 1.) I want to refine the overall system, making sure we have accurate acceleration thresholds and assigning the correct sounds to the correct drum pads from the webapp, and 2.) testing the system with two drum sticks at once. The only worry we have is that since we’ll have two ESP32’s transmitting concurrently, they could interfere with one another and cause packet loss.

November 2, 2024January 9, 2025

Ben Solo’s Status Report for 11/2

This week I spent my time working on optimizing the audio playback module. At the start of the week my module had about 90ms of latency fir every given sound that needed to be played. In a worst case situation, we could work with this, but since we want an overall system latency below 100ms, it was clearly suboptimal. I went through probably 10 iterations before I landed on the current implementation which utilized pyAudio as the sound interface and has what feels like instantaneous playback. I’ll explain the details of what I changed/implemented below and discuss a few of the previous iterations I went through before landing on this final one.
The first step was to create a system that allowed me to both test playing individually triggered sounds via keyboard input while not disrupting the logic of the main controller I explained in my last status report. To do this, I implemented a testing mode. When run with testing=True, the controller takes keyboard inputs w, a, s, d to trigger each of the 4 sounds as opposed to the simulated operating scheme where the loop continually generates random simulated accelerometer impacts and subsequently returns a number in the range [1,4]. This allows me not only to test the latency for individual impacts, but also what the system would operate like when multiple impacts occur in rapid succession.
Having implemented this new testing setup, I now needed to revise the actual playback function responsible for playing a specific sound when triggered. The implementation from last week worked as follows:
1.) at the start of the session, pre-load the sounds so that the data can easily be referenced and played
2.) when an impact occurs, spawn a new thread that handles the playback of that one sound using the sound device library.
The code for the actual playback function looked as follows:

def playDrumSound(index)   
   if index in drumSounds:
        data, fs = drumSounds[index]
        dataSize = len(data)
        print(f'playing sound {index}')
        if dataSize < 6090:
            blockSize = 4096
        elif dataSize < 10000:
            blockSize = 1024
        else:
            blockSize = 256
        with playLock:
            sd.play(data, samplerate=fs, device=wasapiIndex, blocksize=blockSize)

This system was very latent, despite the use of the WASAPI device native to my laptop. Subsequent iterations of the function included utilizing a queue, where each time an impact was detected, it was added ton the queue and played whenever the system could first get to it., This was however a poor idea since this introduces unpredictability into when the sound actually plays, which we can’t have given playing the drums is very rhythm heavy> Another idea I implemented but eventually discarded after testing was to use streamed audio. In this implementation, I spawned a thread for each detected impact which would then write the contents of the sound file to an output stream and play it. However, for reasons still unknown to me (I think it was due to how I was cutting the sound data and loading it into the stream), this implementation was not only just as latent, but also massively distorted the sounds when played.
A major part of the issue was that between the delay inherent in playing a sound (simply the amount of time it takes for the sound to play) and the latency associated with playing the sounds, it was nearly impossible to create an actual rhythm like you would see when playing a drum set. My final implementation, which used pyAudio avoids all these issues by cutting down the playback latency so massively that it almost feels instantaneous. The trick here was a combination of many of the other implementations I had tried out. This is how it works:
1.) at the start of the session we preload each of the sounds so the data and parameters (number of channels, sampling rate, sample width, etc.) were all easily accessible at run time. Additionally, we initialize an audio stream for each of the 4 sounds, so they can each play independent from the other sounds.
2.) during the session, once and impact is detected (a keypress in my case), and the index of the sound to play has been determined, I simply retrieve the sound from our preloaded sounds as well as the associated sounds open audio stream. I then write the frames of the audio to the stream.
This results in near instantaneous playback. The code for this (both preloading and playback) is shown below:

def preload_sounds():
    for index, path in soundFiles.items():
        with wave.open(path, 'rb') as wf:
            frames = wf.readframes(wf.getnframes())
            params = wf.getparams()
            drumSounds[index] = (frames, params)
            soundStreams[index] = pyaudio_instance.open(
                format=pyaudio_instance.get_format_from_width(params.sampwidth),
                channels=params.nchannels,
                rate=params.framerate,
                output=True,
                frames_per_buffer=256
            )

def playDrumSound(index):
    if index in drumSounds:
        frames, _ = drumSounds[index]
        stream = soundStreams[index]
        stream.write(frames, exception_on_underflow=False)

Though this took a lot of time to come to, I think it was absolutely worth it. We now no longer need to worry that the playback of audio will constrain us from meeting our 100ms latency requirement, and can instead focus on the object detection modules and Bluetooth transmission latency. For reference, I attached a sample of how the playback may occur here.

My progress is on schedule this week. In the following week the main goal will be to integrate Elliot’s Bluetooth code, which also reached a good point this week into the main controller so we can actually start triggering sounds via real drum stick impacts as opposed to key board events. If that gets done, I’d like to test the code I wrote last week for detecting the (x, y, r) of the 4 rubber rings in real life, now that we have our webcam. This will probably require me to make some adjustments to the parameters of the hough_circles function we are using to identify them.

October 26, 2024January 9, 2025

Ben Solo’s Status Report for 10/26

This week I spent the majority of my time working on the function “locate_drum_rings” which is triggered via the webapp and initiates the process to find the (x,y) location of the drum rings as well as their radii. This involved developing test images/videos (.mp3), implementing the actual function itself, choosing to use scikit-image over cv2’s hough_circles, tuning the parameters to ensure the correct circles are selected, and testing the function in tandem with the webapp. In addition to implementing and testing this function, I made a few more minor improvements to the other endpoint on our local server “receive_drum_config” which has an error in its logic regarding saving received sound files to the ‘sounds’ directory. Finally, I changed how the central controller I described in my last status report worked a bit to accomodate 2 drumsticks in independent threads. I’ll explain each of these topics in more detail below:

Implementing the “locate_drum_rings” function.
This function is used at the start over every session, or when the user wants to change the layout of their drum set in order to detect, scale, and store the x.y locations and radii of each of the 4 drum rings. It is triggered by the “locate_drum_rings” endpoint on the local server when it receives a signal from the webapp s follows:

from cv_module import locate_drum_rings
@app.route(‘/locate-drum-rings’, methods=[‘POST’])
def locate_drum_rings():
# Call the function to detect the drum rings here
print(“Trigger received. Starting location detection process.”)
locate_drum_rings()
return jsonify({‘message’: ‘Trigger received.’}), 200

When locate_drum_rings() is called here it starts the process of finding the centers and radii of each of the 4 rings in the first frame of the video feed. For testing purposed I generated a sample video with 4 rings as follows:

1.) In MATLAB, I drew 4 rings with the radii of actual rings we plan on using (8.89, 7.62, 10.16, 11.43) at 4 different, non-overlapping locations.

2.) I then took this image and created a 6 second mp4 video clip of the image to simulate what the camera feed would look like in practice.

Then during testing, where I flag testing=True to the function, the code references the video as opposed to the default webcam. One pretty significant change however was that I decided not to use CV2’s Hough circles algorithm and instead use scikit-image’s Hough circles algorithm predominantly because it is much easier to narrow down the number of detected rings to 4, where with CV2’s it became very difficult to do so accurately and with varying radii (which will be encountered due to varying camera heights). The function itself opens the video and selects the first frame as this is all is needed to determine the locations of the drum rings. It then masks the frame and identifies all circles it sees as present (it typically detects circles that aren’t actually there too, hence the need for tuning). Then I use the “hough_circle_peaks” function to specify that it should only retrieve the 4 circles that had the strongest results. Additionally, I specify a minimum distance between 2 detected circles in the filtering process which serves 2 purposes:

1.) to ensure that duplicate circles aren’t detected.

2.) To prevent the algorithm from detecting an 2 circles per ring; 1 for the inner and 1 for the outer radius of the rubber rings

Once these 4 circles are found, I then scale them based on the ratios outlined in the design report to add the equivalent of a 30mm margin around each ring. For testing purposes I then draw the detected rings and scaled rings back on the frame and display the image. The results are shown below:

The process involved tuning both values for minimum/maximum radii, minimum distances between detected circles, and the sigma value for the edge detection sensitivity. The results of the function are stored in a shared variable “detectedRings” which is of the form:

{1: {‘x’: 980.0, ‘y’: 687.0, ‘r’: 166.8}, 2: {‘x’: 658.0, ‘y’: 848.0, ‘r’: 194.18}, 3: {‘x’: 819.0, ‘y’: 365.0, ‘r’: 214.14}, 4: {‘x’: 1220.0, ‘y’: 287.0, ‘r’: 231.84}}

Where the index represents what drum the values correspond to.

Fixing the file storage in the “receive_drum_config” endpoint:
When we receive a drum set configuration, we always store the sound files in the ‘sounds’ directory under the names “drum_{i}.wav” where i is an index 1-4 (corresponding to the drums). The issue however was that when we receive a new drum configuration, we were just adding 4 more files with the same name to the directory, which is incorrect because a.) there should only ever be 4 sounds in the local directly at any given time, and b.) because this would cause confusion when trying to reference a given sound as a result of duplicate names. To resolve this, whenever we receive a new configuration I first clear all files from the sounds directory before adding the new sound files in. This was a relatively simple, but crucial fix for the functionality of the endpoint.

Updates to the central controller:
Given that the controller needs to monitor accelerometer data for 2 drum sticks independently, we need to run 2 concurrent instances of the controller module. I changed the controller.py file to do exactly this: spawn 2 threads running the controller code, each with a different color parameter of either red or green. These colors represent the colors of the tips of the drumsticks in our project and will be used to apply a mask during the object tracking/detection process during the playing session. Additionally, for testing purposes I added a variation without the threading implementation so we can run tests on one independent drumstick.

Overall, this week was successful and I stayed on track with schedule. In the coming week I plan on helping Eliot integrate his BLE code into the controller so that we can start testing with an integrated system. I also plan on working at optimizing the latency of the audio playback module more since while it’s not horrible, it could definitely be a bit better. I think utilizing some sort of mixing library may be the solution here since part of the delay we’re facing now is due to the duration of a sound limiting how fast we can play subsequent sounds.

October 19, 2024January 9, 2025

Ben Solo’s Status Report for 10/19

Over the last week I’ve split my time between two main topics: finalizing the webapp/local server interface, and implementing an audio playback module. I spent a considerable amount of time on both of these tasks and was able to get myself back up to speed on the schedule after falling slightly behind the week before.

The webapp itself was already very developed and close to being done. There was essentially just one additional feature that needed to be written, namely the button that triggers the user’s local system to identify the locations and radii of the drum rings at the start of a playing session. Doing this implies sending a trigger message from the webapp to the local server that initiates the ring detection process. To do this, I sent a post request to the local server running on port 8000 with a message saying “locate rings”. The local server needed a corresponding “locate-drum-rings” endpoint to receive this message, which also needed to be CORS enabled. This means I needed a pre-request and post-request endpoint that sets the request headers to allows for incoming POST requests from external servers. This is done as follows (only the pre-request endpoint is shown):

@app.route('/locate-drum-rings', methods=['OPTIONS'])
def handle_cors_prefilght_locate():
    response = app.make_default_options_response()
    headers = response.headers
    headers['Access-Control-Allow-Origin'] = '*'
    headers['Access-Control-Allow-Methods'] = 'POST, OPTIONS'
    headers['Access-Control-Allow-Headers'] = 'Content-Type'
    return response

Though the CV module for detecting the locations/radii of the rings isn’t fully implemented yet, once it is, it will be as easy as importing the module and calling it in the endpoint. This is once of the tasks I plan on getting to in this coming week. Both of the endpoints on the local server “locate-drum-rings” and “receive-drum-config” (which receives 4 sound files and stores them locally in a sounds directory on the users computer) work as intended and have been tested.

The more involved part of my work this week was implementing a rudimentary audio playback module with a few of the latency optimizations I had read about. However, before I explain the details of the audio playback functions, I want to explain another crucial aspect of the project I implemented: the system controller. During a play session, there needs to be one central program that manages all other processes. i.e. receiving and processing accelerometer data, monitoring for spikes in acceleration and spawning individual threads for object detection and audio playback after any given detected impact. Though we are still in the process of implementing both the accelerometer processing modules and the object detection modules I wrote controller.py in a was that simulates how the system will actually operate. The idea is that then when we eventually get these subcomponents done, it will be very easy to integrate them given a thought out and well structured framework. For instance, there will be a function dedicated to reading in stream accelerometer data called “read_accelerometer_data”. In my simulated version, the function repeatedly retruns an integer between 1 and 10. This value is then passed off to the “detect_impact” function which determines whether a reading surpasses a threshold value. In my simulated controller this value is set to 5, so half for the readings trigger impacts. If an impact is detected, we want to spawn a new thread to handle the object detection and audio playback for that specific impact. This is exactly what the controller does; it generates and starts a new thread that first calls the “perform_object_detection” function (still to be implemented), and then calls the “playDrumSound” function with the drum index returned by “perform_object_detection” function call. Currently, since the “preform_object_detection” function isn’t implemented, it returns a random integer between 1 and 4, representing one of the four drum rings.

Now having outlined the controller I designed, I will explain the audio playback module I developed and some of the optimizations I implemeted in doing so. We are using the soundDevice library inside the sound_player.y file. This file is our audio playback module. When the controller first starts up, it calls two functions from the audio playback module: 1.) “normalize_sounds”, 2.) “preloadSounds”. The first call ensures that each of the 4 sounds in the sounds directory have consistent sampling rates, sampling widths, and use the same number of channels (1 in our case). This helps with latency issues related to needing to adjust sampling rates. The second function call reads each of the 4 sounds and extracts the sampling frequency and data, storing both in a global dictionary. This cuts latency down significantly by avoiding having to read the sound file at play time, and instead being able to quickly reference and play a given sound. Both of these functions execute before the controller even starts monitoring for accelerometer spikes.

Once an impact has been detected, the “playDrumSound” function is called. This function takes an index (1-4) as a parameter and plays the sound corresponding to that index. Sounds are stored locally with a formatted name of the form “drum_{x}.wav” where x is necessarily and integer between 1 and 4. To play the sound, we pull the data and sampling frequency from the global dictionary. We dynamically change the buffer size based on the length of the data, ranging from a minimum of 256 samples to a maximum of 4096. These values will most likely change as we further test our system and are able to reliably narrow the range to something in the neighborhood of 256-1024 samples. We then use the “soundDevice.play()” function to actually play the sound, specifying the sampling frequency, buffer size, data, and most importantly the device to play from. A standard audio playback library like pyGame goes through the basic audio pipeline which introduces latency in a plethora of way. However, by interfacing directly with the WASAPI (Windows audio session api) we can circumnavigate a lot of the playback stack and reduce latency significantly. To do this I implemented a function that identifies whatever WASAPI speakers are listed on the user’s device. This device is then specified as an argument to the “sounddevice.play()” function at execution time.

The result of the joint controller and sound player is a simulator that continuously initiates and plays one of the 4 sounds at random. The system is set up so that we can easily fill in the simulated parts with the actual modules to be used in the final product.

As I stated earlier, I caught back up with my work this week and feel on schedule. In the coming week I plan to develop a module to initially detect the locations and radii of the 4 drum rings when the “locate-drum-rings” endpoint receives the trigger from the webapp. Additionally, I’d like to bring the audio playback latency down further and develop more rigorous tests to determine what the latency actually is, since doing so is quite difficult. We need to find a way to measure the time at which the sound is actually heard, which I am currently unsure of how to do.

October 5, 2024January 9, 2025

Ben Solo’s Status Report for 10/5

In the days after the deign presentation I fell slightly behind schedule on implementing the infrastructure of our audio playback module as I outlined as a task for myself last week. The primary reason for this was due to an unexpected amount of work in my three other technical classes, all of which have exams coming up this week. I was unable to actually create a useful interface to trigger a sequence of sounds and delays, but did spend a significant amount of time researching what libraries to use as well as investigating latency related to audio playback. I now know how to approach our playback module and can anticipate what problems we will likely encounter. We previously planned on using PyGame as the main way to initiate, control, and sequence audio playback. However, my investigation showed that latency with PyGame can easily reach 20ms and even up to 30ms for a given sound. Since we want our overall system latency to remain below 100ms (ideally closer to 60ms), this will not do. This was an oversight as we assumed audio playback to be one of the more trivial aspects of the project. After further research it seems that PyAudio would be a far superior library to utilize for audio playback as it offers a far greater level of control when it comes to specifying sampling rate and buffer size (in samples). The issue with pyGame was that it used a buffer size of 4096 sample. Since playback latency is highly dependent upon buffer latency, a large buffer size like this introduces latency we can’t handle. Buffer latency is calculated as follows:

Buffer Latency (seconds) = (Buffer size {samples}) / (sampling rate {samples per second})

So at the standard audio sampling rate of 44.1kHz, this results in 92.87ms of just buffer latency. This would basically encompass all the latency we can afford. However, by using a lower buffer size of 128 samples and the same sampling rate (since changing the sampling rate could introduce latency in the form of sample rate conversion {SRC} latency) we could achieve just 2.9ms of buffer latency. Reducing the buffer means that fewer audio frames are stored prior to playback. While in standard audio playback scenarios this could introduce gaps in sound when processing a set of frames takes longer than the rest, in our case, since sound files for drums are inherently very short, a small buffer size shouldn’t have much of a negative effect. The other major source of audio playback latency is the OS and driver latency. These will be harder to mitigate but through the use a low latency driver like ASIO (for windows) we may be able to bring this down too. It would allow us to bypass the default windows audio stack and interact directly with the sound card. All in all, it still seems achievable to maintain sub 10ms audio playback latency, but will require more work than anticipated.

Outside of my research into audio playback, I worked on figuring out how we would apply the 30mm margin around each of the detected rings. To do so, we plan on storing each of the actual radii of the drum rings in mm; then when we run our circle detection algorithm (either cv2 ‘s HoughCircles or Contours) which return pixel diameters and compute a scaling ratio = (r_px)/(r_mm). We can then apply the 30mm margin by adding it to the mm unit radius of a given drum and multiplying the result by the scaling factor to get its pixel value.
i.e. adjusted radius (px) = (r_mm + 30mm) * (scaling factor)

This allows us to dynamically add a pixel equivalent of 30mm to each drum’s radius regardless of the camera’s height and perceived ring diameters.

I also spent some time figuring out what our max drum set area would be given camera height and lens angle, and came up with a graphic to demonstrate this. Using a 90 degree lens, we have a maximum achievable layout of 36,864 cm^2, which is 35% greater than that of a standard drum set. Since our project is met to make drumming portable and versatile, it is important that it can both be shrunk down to a very small footprint and expanded to at least the size of a real drum set.

(link here)

In the coming week I plan on looking further into actually implementing our audio playback module using PyAudio. As aforementioned, this will be significantly more involved than previously thought. Ideally, by the end of the week I’ll have developed a module capable of audio playback with under 10ms of latency, which will involve installing and figuring out exactly how to use both PyAudio and most likely ASIO to moderate driver latency. As I also mentioned at the start of my report, this coming week is very packed for me and I will need to dedicate a lot of time to preparing for my three exams. However, if I am already planning on spending a considerable amount of time over Fall break in order to both catch up my schedule and make some headway both on the audio playback front and integrating it with out existing object detection code such that when a red dot is detected in a certain ring, the corresponding sound plays.

September 28, 2024January 9, 2025

Ben Solo’s Status Report for 9/28

This week I focused predominantly on setting up a working (locally hosted) webapp with integrated backend and database, in addition to a local server that is capable of receiving and storing drum set configurations. I will explain the functionality of both below.

The webapp (image below):
The webapp allows the user to completely customize their drum set. First they can choose to upload any sound files they want (.mp3, .mp4, or .wav). These sound files are stored both in the MySQL database and in a server localized directory. The database holds metadata about the sound files such as what user they correspond to, the file name, the name given to the file by the user, and the static url used to fetch the actual data of the sound file from the upload directory. The user can then construct a custom drum set by either searching for sound files they’ve uploaded and dragging them to the drum ring they wish to play that sound, or by searching for a saved drum set. Once they have a drum set they like, they can save the drum set so that they can quickly switch to sets they previously built and liked, or click the “Use this drum set” button, which triggers the process of sending the current drum set configuration to the locally running user server. The webapp allows for quick searching of sounds and saved drum sets and auto populates the drum set display when the user chooses a specific saved drum set . The app runs on localhost:5000 currently but will eventually be deployed and hosted remotely.

The local server:
Though currently very basic, the local sever is configured to run on the users port 8000. This is important as it defines where the webapp should send the drum set configurations to. The endpoint here is CORs enabled to ensure that the data can be sent across servers safely. Once a drum set configuration is received, the endpoint saves each sound in the configuration with a new file name of the form “drum_x.{file extension}”, where x represents the index of the drum that the sound corresponds to, and {file extension} is either .mp3, .mp4, or .wav. These files are then downloaded to a local directory called sounds, which is created if the user hasn’t already done so. This allows for very simple playback using a library like pyGame.

In addition to these two components, I worked on the design presentation slides and helped develop some new use case requirements which we based our design requirements off of. Namely, I came up with the requirement for the machined drumsticks to be below 200g, because as a drummer myself, increasing the weight much above the standard weight of a drumstick (~113g) would make playing DrumLite feel awkward and heavy. Furthermore, we developed the use case requirement of ensuring the 95% of the time that a drum ring is hit, the correct sound plays. This has to do with being confident that we detected the correct drum from the video footage, which can be difficult. To do this, we came up with the design requirement of using an exponential weighting over all the predicted outcomes for the drumsticks location. By applying a higher weight to the more recent frames of video, we think we can come up with a higher level of confidence on what drum was actually hit, and subsequently play the correct sound. This is a standard practice for many such problems where multiple frames need to be accurately analyzed in a short amount of time.

Lastly, I worked on a new diagram (also shown below) for how the basic workflow would look. It aids more as a graphic for the design presentation, but does convey a good amount of information about how the flow of data looks while using drumLite. My contributions of the project are coming along well and are on schedule. The plan was to get the webapp and local server working quickly so that we’d have a code base we can integrate with once we get our parts delivered and can actually start building out the code necessary to do image and accelerometer data processing.

In the next week my main focus will be creating a way to trigger a sequence of locally stored sounds within the local code base. I want to build a preliminary interface where a user inputs a sequence of drum id’s and delays and the corresponding sounds are sequentially played. This interface will be useful, as once the accelerometer data processing and computer vision modules are in a working state, we’d extract drum id’s from the video feed and pass these into the sound playing interface in the same way as is done in the above described simulation. The times at which to play the sounds (represented by delays in the simulation) would come from the accelerometer readings. Additionally, while it may be a reach task, I’d like to come up with a way to take the object detection script Belle wrote and use that to trigger the sound playing module I mentioned before.

(link to image here)

(link to flow chart here)

September 22, 2024January 9, 2025

Ben Solo’s Status Report for 9/21

My time this week was split between preparing for the proposal presentation I delivered on Monday, determining design/implementation requirements, and figuring out exactly what components we needed to order, making sure to identify how components would interact with one another.
The proposal was early on in the week, so most of the preparation for it was done last week and over the last weekend. Sunday night and early on Monday, I spent the majority of my time preparing to deliver the presentation and making sure to drive our use case requirements and their justifications home.

The majority of the work I did this week was related to outlining project implementation specifics and ordering the parts we needed. Elliot and I compiled a list of parts, the quantity needed, and fallback options in the case that our project eventually needs to be implemented in a way other than how we are currently planning to. Namely, we are in the process of acquiring a depth sensing camera (from the ECE inventory) in the case that either we can’t rely on real-time data being transmitted via the accelerometer/ESP32 system and synchronized with the video feed, or that we can’t accurately determine the locations of the rings using the standard webcam. The ordering process required us to figure out basic I/O for all the components, especially the accelerometer and ESP32 [link] so we could make sure to order the correct connectors.

Having a better understanding of what our components and I/O would look like, I created an initial block diagram with a higher level of detail than the vague one we presented in our proposal. While we still need to add some specifics to the diagram, such as exactly what technologies we want to use for the computer vision modules, it represents a solid base for conceptually understanding the interconnectivity of the whole project and how each component will interact with the others. It is displayed below.

Aside from this design/implementation work, I’ve recently started looking into how I would build out the REST API that needs to run locally on the user’s computer. Basically, the endpoint will be running on a local flask server, much like the webapp which will run on a remote hosted flask server. The user will then specify the IP address of their locally running server in the webapp so it can dynamically send the drum set configurations/sounds to the correct server. Using a combination of origin checking (to ensure the POST request in coming from the correct webserver) and CORS (to handle the webapp and API running on different ports), the API will continuously run locally until it receives a valid configuration request. Upon doing so, the drum set model will be locally updated so the custom sound files are easily referenceable during DrumLite’s use.

Our project seems to be moving at a steady pace and I haven’t noticed any reason we currently need to modify our time line. In the coming week, I plan to update our webapp to reflect the fact that we are now using rings to create the layout of the drum set as opposed to specifying the layout in the webapp as well as getting very basic version of the API endpoint locally running. Essentially, I want to get a proof of concept of the fact that we can send requests from the webapp and receive them locally. I’ve already shared the repo we created for the project with Elliot and Belle, but we need to come up with how we plan to organize/separate code for the webapp vs the locally running code. I plan on looking into what the norm for such an application is and subsequently deciding whether another repo should be created to house our computer vision, accelerometer data processing, and audio playback modules.