Ben Solo’s Status Report for 10/5

In the days after the deign presentation I fell slightly behind schedule on implementing the infrastructure of our audio playback module as I outlined as a task for myself last week. The primary reason for this was due to an unexpected amount of work in my three other technical classes, all of which have exams coming up this week. I was unable to actually create a useful interface to trigger a sequence of sounds and delays, but did spend a significant amount of time researching what libraries to use as well as investigating latency related to audio playback. I now know how to approach our playback module and can anticipate what problems we will likely encounter. We previously planned on using PyGame as the main way to initiate, control, and sequence audio playback. However, my investigation showed that latency with PyGame can easily reach 20ms and even up to 30ms for a given sound. Since we want our overall system latency to remain below 100ms (ideally closer to 60ms), this will not do. This was an oversight as we assumed audio playback to be one of the more trivial aspects of the project. After further research it seems that PyAudio would be a far superior library to utilize for audio playback as it offers a far greater level of control when it comes to specifying sampling rate and buffer size (in samples). The issue with pyGame was that it used a buffer size of 4096 sample. Since playback latency is highly dependent upon buffer latency, a large buffer size like this introduces  latency we can’t handle. Buffer latency is calculated as follows:

Buffer Latency (seconds) = (Buffer size {samples}) / (sampling rate {samples per second})

So at the standard audio sampling rate of 44.1kHz, this results in 92.87ms of just buffer latency. This would basically encompass all the latency we can afford. However, by using a lower buffer size of 128 samples and the same sampling rate (since changing the sampling rate could introduce latency in the form of sample rate conversion {SRC} latency) we could achieve just 2.9ms of buffer latency. Reducing the buffer means that fewer audio frames are stored prior to playback. While in standard audio playback scenarios this could introduce gaps in sound when processing a set of frames takes longer than the rest, in our case, since sound files for drums are inherently very short, a small buffer size shouldn’t have much of a negative effect. The other major source of audio playback latency is the OS and driver latency. These will be harder to mitigate but through the use a low latency driver like ASIO (for windows) we may be able to bring this down too. It would allow us to bypass the default windows audio stack and interact directly with the sound card. All in all, it still seems achievable to maintain sub 10ms audio playback latency, but will require more work than anticipated.

Outside of my research into audio playback, I worked on figuring out how we would apply the 30mm margin around each of the detected rings. To do so, we plan on storing each of the actual radii of the drum rings in mm; then when we run our circle detection algorithm (either cv2 ‘s HoughCircles or Contours) which return pixel diameters and compute a scaling ratio = (r_px)/(r_mm). We can then apply the 30mm margin by adding it to the mm unit radius of a given drum and multiplying the result by the scaling factor to get its pixel value.
i.e. adjusted radius (px) = (r_mm + 30mm) * (scaling factor)

This allows us to dynamically add a pixel equivalent of 30mm to each drum’s radius regardless of the camera’s height and perceived ring diameters.

I also spent some time figuring out what our max drum set area would be given camera height and lens angle, and came up with a graphic to demonstrate this. Using a 90 degree lens, we have a maximum achievable layout of 36,864 cm^2, which is 35% greater than that of a standard drum set. Since our project is met to make drumming portable and versatile, it is important that it can both be shrunk down to a very small footprint and expanded to at least the size of a real drum set.

(link here)

In the coming week I plan on looking further into actually implementing our audio playback module using PyAudio. As aforementioned, this will be significantly more involved than previously thought. Ideally, by the end of the week I’ll have developed a module capable of audio playback with under 10ms of latency, which will involve installing and figuring out exactly how to use both PyAudio and most likely ASIO to moderate driver latency. As I also mentioned at the start of my report, this coming week is very packed for me and I will need to dedicate a lot of time to preparing for my three exams. However, if I am already planning on spending a considerable amount of time over Fall break in order to both catch up my schedule and make some headway both on the audio playback front and integrating it with out existing object detection code such that when a red dot is detected in a certain ring, the corresponding sound plays.

Team Status Report for 10/5

For this week, our team worked mainly on the writeup for our design report to fully plan out our final product. We took the time to tackle a few edge cases from our initial blueprint, specifically focusing on the more nuanced details of our design requirements and implementation strategies so that we can better explain our architecture to any reader of the design report. Our schedule remains the same, with Ben developing the web application, Elliot handling the Bluetooth data processing, and Belle covering the computer vision computation onboard the host; we chose, however, to split this week’s stage of our design process differently, with each member focusing on a specific section of the report. We delegated the introduction and requirements to Ben, the architecture and implementation to Elliot, and the testing and tradeoffs to Belle. We decided that this would result in a more well-rounded final product by giving each team member an opportunity to view the project from a holistic perspective before we begin to integrate our modules together. Having each team member dive into other components of the block diagram brought up a few potential concerns we hadn’t considered prior, each of which we then created a mitigation plan for. Some details we worked out this week included the following:

1.) 30mm scalability requirement: As outlined in our proposal and design presentations, one of our use case requirements is to provide the user a 30mm error zone to account for the rubber drumheads deviating from their original position upon impact from the drumsticks. The design requirement we mapped to it for traceability involved deriving a fixed scaling factor to apply to the gathered radii upon detection with the HoughCircles library.  We realized, however, that a single scaling factor across all four drums would not achieve a constant 30mm margin for each drum (as they differ in size), and that the relative diameters in pixels between the drumheads would not be sufficient to determine a scaling factor (an absolute metric is required if our solution is to be applicable for varying camera heights). Hence, our new implementation is to store the absolute sizes of each ring within an internal array and scale based on these known sizes. We can then detect the rings based on their relative sizes, map them to their stored dimensions, and apply a simple separate scaling factor to the radii accordingly. This will prove to be a less error-prone approach as opposed to a purely relative solution where we may have encountered issues if the user did not place all rings in view of the camera, or if the camera was too far from the table to detect small variances in the diameters.

2.) Reliability of BLE packet transmission: Another one of our use case requirements was to ensure a reliable connection within 3m of the laptop, for which we decided to aim for a packet loss of under 2%. Given our original research on the Bluetooth stack and the specifications for the ESP32’s performance, we figured that 2% would be a very reasonable goal. With the second microcontroller also transmitting accelerometer data, however, we run the risk of interference and packet loss, for which we had not developed a mitigation plan. This week, Ben searched for options to lower the packet loss in the event that we do not meet this requirement, eventually landing on the solution of raising the connection interval. Elliot then explored the firmware libraries available in Arduino and confirmed our ability to increase the connection interval with the host device at the cost of over-the-air latency.

3.) Audio output delay: One element we completely overlooked was the main thread’s method of playing audio files, for which we chose to use the pygame mixer. This week, however, our team discovered that this library introduces an unacceptable amount of output latency–we decided to pivot to the use of PyAudio, which is optimized with smaller audio buffers to achieve a much lower processing delay.

4.) Camera specifications: This week, while exploring strategies to most efficiently deploy our computer vision model, we evaluated the effect that a 120 degree field of view camera would have on our CV calculations. We found that wide angle cameras could potentially introduce a form of optical distortion, resulting in stretched pixels and slightly elliptical drumheads, and therefore less precise detection altogether under our framework. We also came to a decision regarding our sliding window, where we chose to now take 0.33 seconds worth of frames before the relevant accelerometer timestamp, since anything higher could lead to potentially false readings. Given these new requirements, we set out to find a high framerate, approximately 90 degree FOV camera, for which we plan to make an order early next week. Below is a diagram we created to help us map out how we’ll use this new field of view:

Next week we plan to stay on schedule and begin working with the physical components we ordered. By Friday, we intend to have a complete report for describing our requirements, strategies, and conscious design decisions in creating our CV-based drumset.