Ben Solo’s Status Report for 10/19

Over the last week I’ve split my time between two main topics: finalizing the webapp/local server interface, and implementing an audio playback module. I spent a considerable amount of time on both of these tasks and was able to get myself back up to speed on the schedule after falling slightly behind the week before.

The webapp itself was already very developed and close to being done. There was essentially just one additional feature that needed to be written, namely the button that triggers the user’s local system to identify the locations and radii of the drum rings at the start of a playing session. Doing this implies sending a trigger message from the webapp to the local server that initiates the ring detection process. To do this, I sent a post request to the local server running on port 8000 with a message saying “locate rings”. The local server needed a corresponding “locate-drum-rings” endpoint to receive this message, which also needed to be CORS enabled. This means I needed a pre-request and post-request endpoint that sets the request headers to allows for incoming POST requests from external servers. This is done as follows (only the pre-request endpoint is shown):

@app.route('/locate-drum-rings', methods=['OPTIONS'])
def handle_cors_prefilght_locate():
    response = app.make_default_options_response()
    headers = response.headers
    headers['Access-Control-Allow-Origin'] = '*'
    headers['Access-Control-Allow-Methods'] = 'POST, OPTIONS'
    headers['Access-Control-Allow-Headers'] = 'Content-Type'
    return response

Though the CV module for detecting the locations/radii of the rings isn’t fully implemented yet, once it is, it will be as easy as importing the module and calling it in the endpoint. This is once of the tasks I plan on getting to in this coming week. Both of the endpoints on the local server “locate-drum-rings” and “receive-drum-config” (which receives 4 sound files and stores them locally in a sounds directory on the users computer) work as intended and have been tested.

The more involved part of my work this week was implementing a rudimentary audio playback module with a few of the latency optimizations I had read about. However, before I explain the details of the audio playback functions, I want to explain another crucial aspect of the project I implemented: the system controller. During a play session, there needs to be one central program that manages all other processes. i.e. receiving and processing accelerometer data, monitoring for spikes in acceleration and spawning individual threads for object detection and audio playback after any given detected impact. Though we are still in the process of implementing both the accelerometer processing modules and the object detection modules I wrote controller.py in a was that simulates how the system will actually operate. The idea is that then when we eventually get these subcomponents done, it will be very easy to integrate them given a thought out and well structured framework. For instance, there will be a function dedicated to reading in stream accelerometer data called “read_accelerometer_data”. In my simulated version, the function repeatedly retruns an integer between 1 and 10. This value is then passed off to the “detect_impact” function which determines whether a reading surpasses a threshold value. In my simulated controller this value is set to 5, so half for the readings trigger impacts. If an impact is detected, we want to spawn a new thread to handle the object detection and audio playback for that specific impact. This is exactly what the controller does; it generates and starts a new thread that first calls the “perform_object_detection” function (still to be implemented), and then calls the “playDrumSound” function with the drum index returned by “perform_object_detection” function call. Currently, since the “preform_object_detection” function isn’t implemented, it returns a random integer between 1 and 4, representing one of the four drum rings.

Now having outlined the controller I designed, I will explain the audio playback module I developed and some of the optimizations I implemeted in doing so. We are using the soundDevice library inside the sound_player.y file. This file is our audio playback module. When the controller first starts up, it calls two functions from the audio playback module: 1.) “normalize_sounds”, 2.) “preloadSounds”. The first call ensures that each of the 4 sounds in the sounds directory have consistent sampling rates, sampling widths, and use the same number of channels (1 in our case). This helps with latency issues related to needing to adjust sampling rates. The second function call reads each of the 4 sounds and extracts the sampling frequency and data, storing both in a global dictionary. This cuts latency down significantly by avoiding having to read the sound file at play time, and instead being able to quickly reference and play a given sound. Both of these functions execute before the controller even starts monitoring for accelerometer spikes.

Once an impact has been detected, the “playDrumSound” function is called. This function takes an index (1-4) as a parameter and plays the sound corresponding to that index. Sounds are stored locally with a formatted name of the form “drum_{x}.wav” where x is necessarily and integer between 1 and 4. To play the sound, we pull the data and sampling frequency from the global dictionary. We dynamically change the buffer size based on the length of the data, ranging from a minimum of 256 samples to a maximum of 4096. These values will most likely change as we further test our system and are able to reliably narrow the range to something in the neighborhood of 256-1024 samples. We then use the “soundDevice.play()” function to actually play the sound, specifying the sampling frequency, buffer size, data, and most importantly the device to play from. A standard audio playback library like pyGame goes through the basic audio pipeline which introduces latency in a plethora of way. However, by interfacing directly with the WASAPI (Windows audio session api) we can circumnavigate a lot of the playback stack and reduce latency significantly. To do this I implemented a function that identifies whatever WASAPI speakers are listed on the user’s device. This device is then specified as an argument to the “sounddevice.play()” function at execution time.

The result of the joint controller and sound player is a simulator that continuously initiates and plays one of the 4 sounds at random. The system is set up so that we can easily fill in the simulated parts with the actual modules to be used in the final product.

As I stated earlier, I caught back up with my work this week and feel on schedule. In the coming week I plan to develop a module to initially detect the locations and radii of the 4 drum rings when the “locate-drum-rings” endpoint receives the trigger from the webapp. Additionally, I’d like to bring the audio playback latency down further and develop more rigorous tests to determine what the latency actually is, since doing so is quite difficult. We need to find a way to measure the time at which the sound is actually heard, which I am currently unsure of how to do.

Team Status Report for 10/19

In the week prior to fall break and throughout the week of fall break our team continued to make good progress on our project. Having received the majority of the hardware components we ordered, we were able to start the preliminary work for flashing the esp32’s with the Arduino code necessary to relay the accelerometer data to our laptop. Additionally, we made progress in our understanding and implementation of the audio playback module, and implemented the final feature needed for the webapp: a trigger to start the ring detection protocol locally.

Currently, the issues that pose the greatest risk to our teams success are as follows:
1.) Difficulty in implementing the BLE data transmission from the drumsticks to the laptop. We know that writing robust code, flashing it onto the esp32’s, and processing the data in real time could pose numerous issues to us. First, implementing the code and flashing the esp32 is a non-trivial task. Elliot has some experience in the area, but having seen other groups attempting to do similar things and struggle, we know this to be a difficult task. Second, since the transmission delay may vary from packet to packet, issues could easily arise given a situation where a packet takes far longer to transmit than others. Currently our mitigation strategy involves determining an average latency through testing many times over various transmission distances. Once we have this average it should encompass the vast majority of expected delay times. If a transmission falls outside of this range, we plan on simply discarding the late packets and continuing as usual.

2.) Drumstick tip detection issues. While it seems that using the cv2 contours function alongside a color mask will suffice to identify the location of the tips of the drumsticks, their is a fair amount of variability in detection accuracy given available lighting. While we currently think applying a strong color mask will be enough to compensate for lighting variability, in the case that it isn’t we plan on adding a lighting fixture mounted alongside the camera on the camera stand to provide consistent lighting in every scenario.

3.) Audio playback latency. As mentioned in the previous report, audio playback can surprisingly introduce significant amounts of latency to the system (easily 50ms) when using standard libraries such as pyGame. We are now using the soundDevice library instead which seems to have brought latency down a bit. However the issue is not as simple as reducing sample buffer size as we have noticed through experimentation that certain sounds, even if the duration of the sounds don’t vary, require higher buffer sizes than others. This is a result of both the sampling frequency used and the overall length of the data in the sound file. Using soundDevice and by interacting directly with the windows WASAPI (windows sound driver) we believe c=we can cut latency down significantly, but if we can’t we plan on using an external Midi controller which facilitates almost instantaneous sound I/O. These controllers are designed for these exact types of applications and help circumnavigate the audio playback pipeline inherent in computers.

The design of our project has not changed aside from the fact that we are now trying to use (and testing with) the soundDevice library as opposed to pyAudio. However, if soundDevice proves insufficient, we will revert and try employing pyAudio with ASIO. We are still on track with our schedule.

Below are the answers to the thought questions for this week.
A was written by Ben Solo, B was written by Elliot Clark, and C was written by Belle Connaught

A.) One of DrumLite’s main appeals is that it is a cost effective alternative to standard electronic drum sets. As was mentioned in the introduction of our design report, a low end drum set can easily cost between $300 and $700 while a better one can go up to $2000. Globally, the cost of drum sets, whether acoustic or electronic, hinder people from partaking in playing the drums. DrumLite’s low cost (~$150) enables many more people to be able to play the drums without having to worry that the cost isn’t justifiable for an entertainment product.
Furthermore, DrumLite makes sharing drum sets infinitely easier. Previously sharing a drum set between countries was virtually impossible as you’d have to ship it back and fourth or buy identical drum sets in order to have the same experience. But with DrumLite, since you can upload any .wav files to the webapp and use these as your sounds, sharing a drum set is trivial. You can just send an email with four .wav attachments and the recipient can reconstruct the exact same drum set you had in minutes. DrumLite not only brings the cost of entry down, but encourages collaboration and the sharing of music on both a local and global scale.

B.) The design of this project integrates several cultural factors that enhance its accessibility, relevance, and impact across user groups. Music is a universal form of expression found in many cultures, making this project inherently inclusive by providing a platform for users to experience drumming without the need for expensive equipment. Given its highly configurable nature, the system can be programmed to replicate sounds from a variety of drums spanning various cultures, therefore enabling cross-cultural appreciation and learning. This project also holds educational potential, particularly in schools or music programs, where it could be used to teach students about different drumming traditions, encouraging cultural awareness and social interaction through drumming practices seen in other cultures. These considerations collectively make the drumlite set not only a technical convenience but also a culturally aware and inclusive platform.

C.) DrumLite addresses a need for sustainable, space-efficient, and low-impact musical instruments by leveraging technology to minimize material use and environmental footprint. Traditional drum sets require numerous physical components such as drum shells, cymbals, and hardware, which involve the extraction of natural resources, energy-intensive manufacturing processes, and significant shipping costs due to their size and weight. By contrast, our project replaces bulky equipment with lightweight, compact components—two drumsticks with embedded sensors, a laptop, and four small rubber pads—significantly reducing the raw materials required for production. This not only saves on manufacturing resources but also reduces transportation energy and packaging waste, making DrumLite more environmentally-friendly.
In terms of power consumption, the system is designed to operate efficiently with the use of low-power ESP32 microcontrollers and small sensors like the MPU-6050 accelerometers. These components require minimal energy compared to traditional electric drum sets or amplification equipment, reducing the device’s carbon footprint over its lifetime.
DrumLite contributes to a sustainable musical experience by reducing waste and energy consumption, all while maintaining the functionality and satisfaction of playing a traditional drum set in a portable, tech-enhanced format.

Belle’s Status Report for 10/19

This past week, I mainly wanted to start drafting a bit more Computer Vision code that could potentially be used in our final implementation. At the time, we had not yet received our accelerometers and microcontrollers in the mail, so I wanted to figure out how to use PyAudio (one of the methods we are considering for playing the drum sound) by creating simple spacebar-triggered sounds from a short input .wav music file. I figured that this process would also help us get an idea of how much latency could be introduced by this process.

I ended up adding code to the existing red dot detection code that first opens the .wav sound file (using the wave module) and a corresponding audio stream (using pyaudio). Then, upon pressing the spacebar, a thread is spawned to create a new wave file object & play the opened sound using the previously mentioned modules/libraries, as well as threading.

Though simple, writing this code was a good exercise for me to get a decent feel of how PyAudio works, as well as what buffer size we should use for when the audio stream is being written (i.e. when the input sound is being played). During our recent meeting with Professor Bain and Tjun Jet, we discussed possibly wanting to have a small buffer size of 128 bytes or so instead of the usual ~4096 to reduce latency. However, I found that the ~(256-512) range is more of a sweet spot for us, as having a small buffer also means that it would get filled more often (which could introduce latency accordingly, especially when it comes to larger audio files). I also found that when the spacebar was mashed quickly (which could simulate multiple, quick, consecutive drum hits), the audio lagged a bit towards the end, despite a new thread being spawned for every spacebar-press detection. I suspect this was likely due to the aforementioned small buffer, as increasing the buffer size seemed to remedy the issue.

Our gantt chart indicates that we should still be working on CV code this week, so I believe I am on schedule.  This week, I hope to work with the ESP32 and accelerometers to get a feel for how they output data in order to determine how to process it. Setting up the ESPs is not necessarily a straightforward task, however, so I plan to work with Elliot in order to get them working (which would be confirmed by at least having some sort of communication between them and the laptop over Bluetooth). From there, if we are able to transmit accelerometer data, I would like to graph it and determine a range for the ‘spike’ generated by creating a ‘hit’ motion with the drumsticks.

Belle’s Status Report for 10/5

This week, most of my time was taken up by a major deadline in another class, so my main task included working on the Design Trade Studies and Testing/Verification sections of our draft Design Report. I spent a couple of hours looking into key components of our project (such as the microcontroller, proposed Computer Vision implementation, and camera type), comparing and contrasting them against other potential candidates to ensure that our choices were optimal, and putting these differences into words.

I was also able to modify the CV dot detection code from last week to determine how long it takes to process one frame of the sample video input. This yielded a consistent processing time of ~1.3-1.5ms per frame, which allows us to determine how many frames can be processed once an accelerometer spike is read (while staying below our CV latency limit of 60ms).

Since our gantt chart still has us working on CV code this week, I believe the rest of the team and I are still on schedule. This coming week, I plan to finalize my parts of the design report and start trying to feed in real-time camera input to the current CV code – if one is delivered this week. If not, I would like to feed in accelerometer data to determine the minimum threshold of a “hit,” and start thinking about how to incorporate that into the code.

Elliot’s Status Report for 10/5

Following the design presentation this past week, I worked on the implementation details for our final design report writeup, where I outline our libraries, equations, and general strategies internally for how we’ll communicate between modules. I spoke with Ben and Belle about how we’d carry out the 30mm buffer zone in our use case requirement, how many frames we would have to process from our sliding window on the event of any given hit, and how the resolution and field of view of our chosen camera would impact the detection of rings on the table. Hence, given that we were not able to successfully place an order for a web camera off Amazon, I had the opportunity to search for a suitable camera with our new priorities being the ability to process 20 relevant frames from our frame window while also avoiding the optical distortion resulting from a high field of view. I found a 1080p, 60FPS camera with an 84 degree field of view in the Elgato MK.2, which we’ll be considering alongside other options within the next few days; the most crucial requirements were the framerate, where I decided that a web camera running at 60 frames per second should allow us to gather 20 frames of relevant imaging (up to 0.33 seconds pre-timestamp), and the field of view, where the team concluded that anything higher than 90 degrees could distort our pixel-based sizing calculations. Apart from exploring our hardware needs, I finalized the connectivity of our BLE, since the online simulation I used only operated up to the advertising stage and wasn’t able to emulate over-the-air pairing. The host device code should be simple, where we’ll run two threads pairing with separate MAC addresses for the microcontrollers and subscribing to the UUIDs of the accelerometer characteristics, although I’m waiting for parts to arrive for testing. Overall, myself and the team are on schedule, and we should be well prepared for bug-fixing and unit testing post-break. This week, I personally plan to:

  1. Work hands-on with the ESP32 and MPU-6050. I plan to solder the jumper cables between the two for the I2C serial communication and flash firmware code to start advertising over bluetooth.
  2. Finalize our design report. I’m looking to complete my bluetooth testing by Thursday, from which point I’ll incorporate it into the repository and describe the technical details within the implementation section of our writeup.

Ben Solo’s Status Report for 10/5

In the days after the deign presentation I fell slightly behind schedule on implementing the infrastructure of our audio playback module as I outlined as a task for myself last week. The primary reason for this was due to an unexpected amount of work in my three other technical classes, all of which have exams coming up this week. I was unable to actually create a useful interface to trigger a sequence of sounds and delays, but did spend a significant amount of time researching what libraries to use as well as investigating latency related to audio playback. I now know how to approach our playback module and can anticipate what problems we will likely encounter. We previously planned on using PyGame as the main way to initiate, control, and sequence audio playback. However, my investigation showed that latency with PyGame can easily reach 20ms and even up to 30ms for a given sound. Since we want our overall system latency to remain below 100ms (ideally closer to 60ms), this will not do. This was an oversight as we assumed audio playback to be one of the more trivial aspects of the project. After further research it seems that PyAudio would be a far superior library to utilize for audio playback as it offers a far greater level of control when it comes to specifying sampling rate and buffer size (in samples). The issue with pyGame was that it used a buffer size of 4096 sample. Since playback latency is highly dependent upon buffer latency, a large buffer size like this introduces  latency we can’t handle. Buffer latency is calculated as follows:

Buffer Latency (seconds) = (Buffer size {samples}) / (sampling rate {samples per second})

So at the standard audio sampling rate of 44.1kHz, this results in 92.87ms of just buffer latency. This would basically encompass all the latency we can afford. However, by using a lower buffer size of 128 samples and the same sampling rate (since changing the sampling rate could introduce latency in the form of sample rate conversion {SRC} latency) we could achieve just 2.9ms of buffer latency. Reducing the buffer means that fewer audio frames are stored prior to playback. While in standard audio playback scenarios this could introduce gaps in sound when processing a set of frames takes longer than the rest, in our case, since sound files for drums are inherently very short, a small buffer size shouldn’t have much of a negative effect. The other major source of audio playback latency is the OS and driver latency. These will be harder to mitigate but through the use a low latency driver like ASIO (for windows) we may be able to bring this down too. It would allow us to bypass the default windows audio stack and interact directly with the sound card. All in all, it still seems achievable to maintain sub 10ms audio playback latency, but will require more work than anticipated.

Outside of my research into audio playback, I worked on figuring out how we would apply the 30mm margin around each of the detected rings. To do so, we plan on storing each of the actual radii of the drum rings in mm; then when we run our circle detection algorithm (either cv2 ‘s HoughCircles or Contours) which return pixel diameters and compute a scaling ratio = (r_px)/(r_mm). We can then apply the 30mm margin by adding it to the mm unit radius of a given drum and multiplying the result by the scaling factor to get its pixel value.
i.e. adjusted radius (px) = (r_mm + 30mm) * (scaling factor)

This allows us to dynamically add a pixel equivalent of 30mm to each drum’s radius regardless of the camera’s height and perceived ring diameters.

I also spent some time figuring out what our max drum set area would be given camera height and lens angle, and came up with a graphic to demonstrate this. Using a 90 degree lens, we have a maximum achievable layout of 36,864 cm^2, which is 35% greater than that of a standard drum set. Since our project is met to make drumming portable and versatile, it is important that it can both be shrunk down to a very small footprint and expanded to at least the size of a real drum set.

(link here)

In the coming week I plan on looking further into actually implementing our audio playback module using PyAudio. As aforementioned, this will be significantly more involved than previously thought. Ideally, by the end of the week I’ll have developed a module capable of audio playback with under 10ms of latency, which will involve installing and figuring out exactly how to use both PyAudio and most likely ASIO to moderate driver latency. As I also mentioned at the start of my report, this coming week is very packed for me and I will need to dedicate a lot of time to preparing for my three exams. However, if I am already planning on spending a considerable amount of time over Fall break in order to both catch up my schedule and make some headway both on the audio playback front and integrating it with out existing object detection code such that when a red dot is detected in a certain ring, the corresponding sound plays.

Team Status Report for 10/5

For this week, our team worked mainly on the writeup for our design report to fully plan out our final product. We took the time to tackle a few edge cases from our initial blueprint, specifically focusing on the more nuanced details of our design requirements and implementation strategies so that we can better explain our architecture to any reader of the design report. Our schedule remains the same, with Ben developing the web application, Elliot handling the Bluetooth data processing, and Belle covering the computer vision computation onboard the host; we chose, however, to split this week’s stage of our design process differently, with each member focusing on a specific section of the report. We delegated the introduction and requirements to Ben, the architecture and implementation to Elliot, and the testing and tradeoffs to Belle. We decided that this would result in a more well-rounded final product by giving each team member an opportunity to view the project from a holistic perspective before we begin to integrate our modules together. Having each team member dive into other components of the block diagram brought up a few potential concerns we hadn’t considered prior, each of which we then created a mitigation plan for. Some details we worked out this week included the following:

1.) 30mm scalability requirement: As outlined in our proposal and design presentations, one of our use case requirements is to provide the user a 30mm error zone to account for the rubber drumheads deviating from their original position upon impact from the drumsticks. The design requirement we mapped to it for traceability involved deriving a fixed scaling factor to apply to the gathered radii upon detection with the HoughCircles library.  We realized, however, that a single scaling factor across all four drums would not achieve a constant 30mm margin for each drum (as they differ in size), and that the relative diameters in pixels between the drumheads would not be sufficient to determine a scaling factor (an absolute metric is required if our solution is to be applicable for varying camera heights). Hence, our new implementation is to store the absolute sizes of each ring within an internal array and scale based on these known sizes. We can then detect the rings based on their relative sizes, map them to their stored dimensions, and apply a simple separate scaling factor to the radii accordingly. This will prove to be a less error-prone approach as opposed to a purely relative solution where we may have encountered issues if the user did not place all rings in view of the camera, or if the camera was too far from the table to detect small variances in the diameters.

2.) Reliability of BLE packet transmission: Another one of our use case requirements was to ensure a reliable connection within 3m of the laptop, for which we decided to aim for a packet loss of under 2%. Given our original research on the Bluetooth stack and the specifications for the ESP32’s performance, we figured that 2% would be a very reasonable goal. With the second microcontroller also transmitting accelerometer data, however, we run the risk of interference and packet loss, for which we had not developed a mitigation plan. This week, Ben searched for options to lower the packet loss in the event that we do not meet this requirement, eventually landing on the solution of raising the connection interval. Elliot then explored the firmware libraries available in Arduino and confirmed our ability to increase the connection interval with the host device at the cost of over-the-air latency.

3.) Audio output delay: One element we completely overlooked was the main thread’s method of playing audio files, for which we chose to use the pygame mixer. This week, however, our team discovered that this library introduces an unacceptable amount of output latency–we decided to pivot to the use of PyAudio, which is optimized with smaller audio buffers to achieve a much lower processing delay.

4.) Camera specifications: This week, while exploring strategies to most efficiently deploy our computer vision model, we evaluated the effect that a 120 degree field of view camera would have on our CV calculations. We found that wide angle cameras could potentially introduce a form of optical distortion, resulting in stretched pixels and slightly elliptical drumheads, and therefore less precise detection altogether under our framework. We also came to a decision regarding our sliding window, where we chose to now take 0.33 seconds worth of frames before the relevant accelerometer timestamp, since anything higher could lead to potentially false readings. Given these new requirements, we set out to find a high framerate, approximately 90 degree FOV camera, for which we plan to make an order early next week. Below is a diagram we created to help us map out how we’ll use this new field of view:

Next week we plan to stay on schedule and begin working with the physical components we ordered. By Friday, we intend to have a complete report for describing our requirements, strategies, and conscious design decisions in creating our CV-based drumset.

Ben Solo’s Status Report for 9/28

This week I focused predominantly on setting up a working (locally hosted) webapp with integrated backend and database, in addition to a local server that is capable of receiving and storing drum set configurations. I will explain the functionality of both below.

The webapp (image below):
The webapp allows the user to completely customize their drum set. First they can choose to upload any sound files they want (.mp3, .mp4, or .wav). These sound files are stored both in the MySQL database and in a server localized directory. The database holds metadata about the sound files such as what user they correspond to, the file name, the name given to the file by the user, and the static url used to fetch the actual data of the sound file from the upload directory. The user can then construct a custom drum set by either searching for sound files they’ve uploaded and dragging them to the drum ring they wish to play that sound, or by searching for a saved drum set. Once they have a drum set they like, they can save the drum set so that they can quickly switch to sets they previously built and liked, or click the “Use this drum set” button, which triggers the process of sending the current drum set configuration to the locally running user server. The webapp allows for quick searching of sounds and saved drum sets and auto populates the drum set display when the user  chooses a specific saved drum set . The app runs on localhost:5000 currently but will eventually be deployed and hosted remotely.

The local server:
Though currently very basic, the local sever is configured to run on the users port 8000. This is important as it defines where the webapp should send the drum set configurations to. The endpoint here is CORs enabled to ensure that the data can be sent across servers safely. Once a drum set configuration is received, the endpoint saves each sound in the configuration with a new file name of the form “drum_x.{file extension}”, where x represents the index of the drum that the sound corresponds to, and {file extension} is either .mp3, .mp4, or .wav. These files are then downloaded to a local directory called sounds, which is created if the user hasn’t already done so. This allows for very simple playback using a library like pyGame.

In addition to these two components, I worked on the design presentation slides and helped develop some new use case requirements which we based our design requirements off of. Namely, I came up with the requirement for the machined drumsticks to be below 200g, because as a drummer myself, increasing the weight much above the standard weight of a drumstick (~113g) would make playing DrumLite feel awkward and heavy.  Furthermore, we developed the use case requirement of ensuring the 95% of the time that a drum ring is hit, the correct sound plays. This has to do with being confident that we detected the correct drum from the video footage, which can be difficult. To do this, we came up with the design requirement of using an exponential weighting over all the predicted outcomes for the drumsticks location. By applying a higher weight to the more recent frames of video, we think we can come up with a higher level of confidence on what drum was actually hit, and subsequently play the correct sound. This is a standard practice for many such problems where multiple frames need to be accurately analyzed in a short amount of time.

Lastly, I worked on a new diagram (also shown below) for how the basic workflow would look. It aids more as a graphic for the design presentation, but does convey a good amount of information about how the flow of data looks while using drumLite. My contributions of the project are coming along well and are on schedule. The plan was to get the webapp and local server working quickly so that we’d have a code base we can integrate with once we get our parts delivered and can actually start building out the code necessary to do image and accelerometer data processing.

In the next week my main focus will be creating a way to trigger a sequence of locally stored sounds within the local code base. I want to build a preliminary interface where a user inputs a sequence of drum id’s and delays and the corresponding sounds are sequentially played. This interface will be useful, as once the accelerometer data processing and computer vision modules are in a working state, we’d extract drum id’s from the video feed and pass these into the sound playing interface in the same way as is done in the above described simulation. The times at which to play the sounds (represented by delays in the simulation) would come from the accelerometer readings. Additionally, while it may be a reach task, I’d like to come up with a way to take the object detection script Belle wrote and use that to trigger the sound playing module I mentioned before.

(link to image here)


(link to flow chart here)

Team Status Report for 9/28

This week our team focused heavily on preparing for the design presentation. This meant not only continuing to build up our ideas and concepts for DrumLite, but in doing so, actually starting to develop some base code to be used for both initial experimentation and proof of concept, but also for testing purposes down the line. We initially struggled with coming up with design requirements base on our use case requirements. We couldn’t really understand the difference between the two initially, but came to the conclusion that while use case requirements were somewhat black boxed and focused more on how the product needs to behave/function, design requirements should focus on the requirements that need to be met implementation-wise in order to achieve the use case requirements.

We then developed our design requirements and directly related them to our use case requirements. In doing so, we also added 3 new use case requirements, which are indicated using * below. These were as follows :
1.) (use case) Drum ring detection accuracy within ≤30 mm of actual placement.

(design)Dynamically scale the detected pixel radii to match the actual ring diameters + 30mm of margin

2.) (use case) Audio response within ≤100ms of drumstick impact.

(design)BLE <30 ms, accelerometer @1000Hz, CV operating under 60ms (~15fps)

3.) (use case) Minimum layout area of 1295 cm2 (37cm x 35cm).

(design) For any given frame, the OpenCV algorithm will provide a proximity check to ensure the correct choice across a set of adjacent rings

4.) (use case*) Play the sound of the correct drum >= 95% of the time.

(design) Average stick location across all processed frames.
Exponential weighting 𝞪 = 0.8 on processed frames.

5.) (use case*) BLE transmission reliability within 10ft from laptop

(design) <=2% BLE packet loss within 10 ft radius of the laptop.

6.) (use case*) Machined drum sticks under 200g in weight.

(design) Esp32 : 31g, MPU-6050: 2g, Drumstick: 113.4g –> Ensure connective components are below 53.6g.

Individually we each focused on our own subdivisions within the project in order to either establish some ground work or better understand the technologies were working with.

Ben: Worked on creating a functioning, locally hosted webapp integrated with UI to allow for drum set configuration/customization, sound file and drum set configuration storage (MySQL), and an endpoint to interact with a users locally running server. He also worked on a functioning local server (also using flask) to receive and locally store sound files for quick playback. Both parts are fully operational and successfully communicate with one another. The next tasks will be to integrate the stored sound files with other code for image and accelerometer processing that also runs locally. Finally, the webapp will need to be deployed and hosted. Risks for this component of the project are centered around how deploying the webapp will affect the ability for the webapp server to communicate with the local one, as currently the both run on local host.

Belle: Focused on developing a testing setup for our object detection system. In order to determine the average amount of time required for OpenCV object detection to identify the location of a drumstick tip, Belle created a MATLAB script in which she drew 4 rings of varying size and moved a red dot (simulating the tip of the drum stick) around the 4 rings. We will use a screen capture of this animation in order to determine a.) the amount of time it takes to process a frame, and b.) subsequently the number of frames we can afford to pass to the object detection module after detecting a hit. Currently, she has a python script using OpenCV that is successfully able to identify the location of the dot as it moves around within the 4 rings. The next step here is to come up timing metrics for object detection per frame.

Elliot: Elliot spent much of his time in preparation for the design presentation, fleshing out our ideas fully, and figuring out how best to explain them. In addition, he worked out how we will communicate with the MCU and accelerometer contained by the drumsticks via python. He additionally looked into BlueToothSerial for interfacing with the ESP32, and confirmed that we can use BLE for sending the accelerometer data, and usb for flashing the microcontroller. Finally, he identified a BLE simulator which we plan on using both for testing purpose and for preliminary research. This simulator accurately simulates how the ESP32 will work, including latency, packet loss, and so fourth.

In regards to the questions pertaining to the safety, social, and economic factors of our project, these were our responses (A was written by: Belle , B was written by: Elliot, C was written by: Ben)

Part A: Our project takes both public health and safety into consideration, particularly with regard to the cost and hazards of traditional drum sets. By focusing on mobility, the design enables users to play anywhere as long as they have a surface for the drum rings – effectively removing the space limitations often encountered with standard drum sets, and offering a more affordable alternative that lowers financial barriers to musical engagement. This flexibility empowers individuals to engage with their music in diverse environments, fostering a sense of freedom and creativity without having to worry about transporting heavy equipment or space constraints. Additionally, the lightweight, compact nature of the rings ensures that users can play without concerns of drums falling, causing injury, or damaging surrounding objects. This design significantly enhances user safety and well-being, and promotes an experience where physical well-being and ease of use are key.

Part B: Our CV-based drum set makes music creation more accessible and inclusive. This project caters to individuals who may not have access to physical drum sets due to space constraints, enabling them to engage in music activities without needing traditional instruments. The solution promotes the importance of music as a means of social connection, self-expression, and well-being. The project also helps foster inclusivity by being adaptable to different sound sensitivities, as you can adjust the sound played from each drum. By reducing the barrier to entry for using drum equipment, we aim to introduce music creation to new audiences.

Part C: As was stated in our use case, drum sets, whether electronic or acoustic, are very expensive (easily upwards of $400). This limits the number of people who are able to engaging in playing the drums greatly simply because there is a high cost barrier. We have calculated the net cost of our product which sits right around $150, nearly a quarter of the price of a what an already cheap drum set would cost. The reason for our low cost is that the components of a physical drum set are much more expensive. Between a large metal frame, actual drums/drum pads, custom speakers, and brain to control volume and customization, the cost of an electronic drum set sky rockets. Our project leverage the fact the we don’t need drum pads, sensors, a frame, or a brain to work; it just needs our machined sticks, a webcam, and access to the webapp. This provides access to drum sets to many individual who would otherwise not be able to play the drums.

We are currently on schedule and hope to receive our ordered part soon so we can start testing/experimenting with the actual project components as opposed to the various simulators we’ve been using thus far. Below are a few images showing some of the progress we’ve made this week:

(The webapp UI –>  link to view image here)

(The dot simulation)

Elliot’s Status Report for 9/28

This week, I was tasked with establishing a basis for communicating with the ESP32 boards and their corresponding MPU-6050 accelerometers. I spent some time looking into the BLE stack to determine the complexity needed for our GATT services and characteristics–the available options were the native ESP IoT Dev Framework, the Arduino IDE, and a MicroPython-based firmware. I concluded that while the ESP-IDF would give us the most control over the pipeline we implement, since our main purpose is to simply transmit the accelerometer data and its timestamp, the service complexity does not call for any fine tuning. Between the Arduino framework and MicroPython, it would be be best to use a compiled language rather than an interpreted one for the purpose of lower latency. Therefore, I started developing some of the C++ code we’ll eventually be flashing to our microcontrollers; to test functionality, I worked on an ESP simulator on Wokwi to set up a bluetooth connection and send accelerometer data to notified clients. Some libraries necessary for the arduino framework include Wire.h for I2C, BLEUtils for initializing the advertising and notifications, and the MPU6050 device driver.

I also practiced my presentation approach for Monday, where I’ll be talking about how our bluetooth, computer vision, and web server modules interact. I emphasized time spent on identifying why we chose the components we chose, as well as developing concrete requirements to link back to our product use case requirements.

For next week, my plan is to:

  1. Hopefully receive our hardware and begin testing the accelerometer thresholds. I’ll set up communication over bluetooth to relay data, and based on our drum hits, we’ll then look at what signal spikes would indicate an adequate hit.
  2. Test bluetooth latency and packet loss. Once we have the ESP32’s wired up, I can insert packet misses to determine adequate rates, as well as measure the latency of our transmissions with timestamps. This is especially important to us, given the fact that we’ll be using two transmission devices in a low latency environment.