What I did last week:
- GROUND-TRUTH FORM: I started the week by making a Google form and corresponding spreadsheet to collect/organize ground truth data from Prof Dueck and her students.
- https://docs.google.com/forms/d/e/1FAIpQLSdZ1LsQ5ckyfkgTnUzCFKm7-o78PcxrrPIhwr4NxlNbCd_XEQ/viewform?usp=sf_link
- For each clip included, we asked Professor Dueck and her students to fill out a tension identification rubric (sent by Prof Dueck), whether they thought the clips were tense in general, and their level of confidence in their response.
- I use the video clips Shaye created by dividing the recorded footage of Prof Dueck’s students into 10-second snippets. I had to upload each of these videos to YouTube in order to link them to the form.
- There are a total of 23 clips included, though we have 50 clips in total. The form only uses half of the clips we have since the form was getting quite long. Additionally, some of the clips were unusable since the hands were covered by the pianist’s head.
- CLEAN LIVE-FEEDBACK CODE: At the beginning of the week, I also cleaned up the code from the hackster article since it includes a lot of optional functionality that we don’t use. A more concise version will be easier to debug in the future too. This can be found in the “cleaned up code” commit on the jessie_branch
- DETERMINE HANDEDNESS: I then tried to figure out how to edit the tension algorithm so it would work for 2 hands (instead of just one). However, I was unable to determine the handedness (identifying right versus left hand) metric from the model. There was a variable defined in the precompiled model; however, the output seemed the same for left and right hands. Shaye said they will write an algorithm to determine the handedness since we’re not sure the model provides it.
- VIDEO RECORDING: I also worked to integrate video recording into the live feedback code based on the code from the Hackster article. I wanted to include the visual of the model’s landmark placement in the video so that users were able to see if the landmarks were incorrect, which could cause inconsistencies in our tension detection algorithm. The code from the Hackster article creates an image for each frame processed where the landmarks are placed onto the inputted video feed and then outputted to the display. I created an array of frames (with the landmarks placed onto the video feed), which could then be written into a video. The changes can be found in the “can record video and buzzes every second instead of all the time” commit on the jessie_branch
- A minor concern I have right now is that our captured frame rate is not consistent, it fluctuates. However, when I write the frames into a video, it must use a fixed frame rate. Therefore the video might not be synchronized with the timing in which the video was recorded. I don’t think this will have any implications on our functionality, though it could be disorienting for users.
- ADJUST BUZZER FEATURE: Lastly, I adjusted the buzzer feature so that instead of constantly buzzing when tension is detected, it would buzz once when tension is detected and continue to buzz at a rate inputted by the user if tension is still detected. I attempted to achieve this by using a flag for tension and creating a recursive thread with a timer. Each time tension is newly detected, a thread to continually buzzing will be started. It uses a timer to wait a user-specified time before buzzing again. If tension is no longer detected (signaled by the flag), the thread will exit. These changes can also be found in the “can record video and buzzes every second instead of all the time” commit on the jessie_branch
What I did this week:
- INTERFACE LED AND ESP32: At the beginning of this week, I learned how to work with the RGB LED and ESP32 board. I first had to determine whether the RGB LEDs we had were common cathode or common anode– I didn’t have a multimeter on hand so I tried different wirings of the LEDs to the RPi and noted when the LED lit up. Once we determined whether the LEDs were common cathode or common anode, I wired the LED to the ESP32 board. I used an Arduino IDE and with the help of ChatGPT, I was able to download the necessary libraries and achieve the basic functionality of the LED. We want the LED to be wireless so that the user can place it somewhere in their line of sight and not be restricted by the RPi wiring, so we opted to use HTTP requests to communicate requests from the RPi to the ESP32. Code for the basic functionality can be found here.
- INTEGRATE LED TO RPI: I then integrated the LED into the live-feedback code. The LED is turned off at the beginning of the program and at the end of the program. That way if the LED is on, it should signal that recording is occurring. Additionally, I was able to determine how many hands the model recognizes by evaluating the length of the landmarks array; I used this information to make the LED green when >= 2 hands were in frame and make the LED red otherwise (only 1 or 0 hands were recognized).
- I reverted this code to debug the integration of user-specified values with the web app, but the final code on the RPi can be found in this commit. I also made a copy of the code that is on the ESP32 and that can be found here.
- When I implemented setting the LED color for each frame that was inputted, the frame rate dropped down significantly (around 7 fps). I think this is because it takes time for the ESP32 to respond to the GET request from the RPi and blocks the program. To combat this I limit the number of requests sent by creating a flag for the current color the LED is set to and only send a request when the color must be changed. I also implemented threading for each request that needs to be sent.
- Another problem I ran into was that sometimes the connection between the RPi and ESP32 would fail, so the request to change the LED color would be dropped. To handle this, I handle the connection error exception to recursively call the change color function until it succeeds.
- A concern that I still have is that sometimes it seems like the ESP32 gets overwhelmed– it gets a bit warm and many requests will fail. Further testing will need to be done to determine how robust our system is and if this could be a problem.
- DEBUG BUZZER: I returned to editing the buzzer feature because after more iterations it became apparent there was a severe multi-threading problem. The buzzing often started out fairly evenly spaced, but as the ‘practice’ session progressed the buzzing became more erratic.
- I think the unevenness of buzzes was due to multiple buzzing threads existing at the same time. Multiple buzzing threads can exist if in the span of the timer waiting in one thread, the current state went from tense to not tense and then back to tense. Then the buzzing thread would not detect that the user became not tense and would continue buzzing while another buzzing thread would be created. To remedy this issue I used an “event” in Python to block/unblock threads. This way I could detect if there was a change in state and interrupt the thread.
- At first, I used this event in conjunction with the flag I previously used; however, I think I was still having more subtle threading issues. Print statements when a thread was created and when a thread finished confirmed the existence of multiple threads at the same time, though less frequently than before. I believe this could’ve been due to some race conditions since setting the flag and setting the event was not atomic. To fix this, I removed the use of the flag and checked whether the event was set or not instead. Print statements confirmed the existence of only one buzzing thread at a time.
- There are still instances when buzzing is not evenly spaced (2 buzzes close together); however, my print statements show that this is due to the fast change in state from tense, to not tense, and back to tense as the system will immediately buzz each time tension is detected. If this proves to be a big issue in practice, we can increase the window size in the tension algorithm so that the algorithm is more robust to small changes in angle deviation.
- The finalized code can be found at this git commit.
- TEST LATENCY: Once the buzzer feature was working as intended, I tested the system’s live feedback latency (the time between tension and live audio feedback).
- To do this, I deviated a bit from what I had previously planned. Originally, I was planning on finding the time between tension detection and audio feedback; however, when I tried doing this by putting a print statement when we detect tension, the print statement came after the audio feedback. This implies that the time between detection and outputting is extremely small. This measurement is likely inaccurate since there is a noticeable gap between when I am tense (no hand movement) and when I hear audio feedback. This implies that most of the latency is from the model processing/tension detection algorithm.
- To better test the system, I found the time between when I was tense and when I heard a buzz. It is difficult to determine exactly when a piano player becomes tense; however, our algorithm will detect tension when there is no horizontal hand movement (waving) and vice versa. So I decided to have iterations of me waving my hand, and then abruptly stopping so I could precisely document the start time of tension.
- Danny helped me record a video on his phone (our system doesn’t have a microphone) of me interacting with the system by waving my hand. I then played this video at 0.1x speed to have a finer time granularity when documenting the time of the tension and audio feedback.
- I used the recorded times of the tension and audio feedback to find the latency. From there I found some metrics to present in our final presentation.
- Here is a link to the video Danny recorded that I used to measure latency time: https://drive.google.com/drive/u/0/folders/1W-ZQKiA9caTTIdh_Ydgcm5rLkRDS_jbL
- Here is a link to a Google sheet with my results: https://docs.google.com/spreadsheets/d/1GLNH3a0HwWe3TAwUPFOv4lbOCp4f87zj3cBo2VtS-zI/edit?gid=0#gid=0
- It is worth noting that I was only able to test our system with one hand in the frame because the handedness feature has yet to be completed (so our system only works on one hand). This is important because our system has a much lower frame rate once two hands are in frame (22 fps versus 30 fps), so the latency will likely be higher than what I recorded once the tension detection algorithm is finalized.
- INTERFACE USER INPUTS: I then worked with Danny to interface the start/stop recording functionality, inputting the buzzer volume, whether the user wants to display the footage, inputting the time between consecutive buzzes, and outputting the video recorded by the live-feedback code between the RPi and the web app.
- I created functions that would call and quit the live-feedback program. This can be found in the file call_ported_live_blaze_handpose.py
- I adjusted the live-feedback code to take the buzzer volume, the time between consecutive buzzes, and the toggling of the display as arguments that could be passed in. The finalized code can be found at this git commit
- INTERFACE TENSION GRAPHS: I also started to work with Danny to interface the synced tension graphs with the video from the RPi to the web app. To do this, I created an array of tense/not tense values that are written to a csv file that corresponds to the recorded video of the same name.
- FRAME RATE AND POST-PROCESSING VERIFICATION: Lastly, I started to work on the frame rate and post-processing time tests.
- The given code finds the frame rate by using a ticker to keep track of time. Each time 10 frames are processed, it will divide 10 by the amount of time that has passed to find the frame rate. I add these values to an array to be averaged at the end of the program.
- To find the post-processing time I create a time-stamp before the frames are written to a video and after the frames have been written to the video.
- The script can be found here:
- Next week I plan to convert all the videos to a smaller size and investigate their frame rates.
- First I had to figure out how to input videos (instead of using video from a webcam) into our system. The code from the hackster article already had the functionality for this, but I discovered I was only able to input .mp4 videos (not .mov).
- I created a set of videos which were of various lengths. I used videos we recorded from Prof Dueck’s students, which varied in piece type but were all around 1 minute. To stress-test the post-processing time, I downloaded a couple of longer videos from YouTube. I ensured all these videos were .mp4.
- I then edited my code to record the captured fps values and post-processing time.
- Then, with the help of ChatGPT, I wrote a script to run the downloaded videos and gather the video length, average framerate, and post-processing time into a csv file.
- I tried testing this script on just one video and my code seemed to work for a while, but was killed prematurely. After looking at the video’s metadata, I believe this happened because the video was too big and took up too many resources. Once I made the video smaller (scaling the height/width) the code was able to successfully complete. However, the video seemed to be playing on the display much slower than the webcam input which caused my script to take a while. I still have to investigate and fix this, but I suspect it is because of a high input video frame rate.
Schedule:
I am still roughly on schedule as the system integration is largely complete and I am almost done with testing. The only 2 big parts of system integration left are 1. tension graphs to sync with video on the web app and 2. integrating the finished tension algorithm. For 1. I have already outputted an array of tense/not-tense information for Danny to turn into a graph. For more sophisticated graphs I am waiting for Shaye to finalize the tension detection algorithm. I am also dependent on the finalized tension detection algorithm for 2.
Next week’s deliverables:
- Finish testing frame rate and post-processing times.
- Redo latency testing on the finished tension detection algorithm (with 2 hands in frame).
Learning Reflections:
I had to learn a lot for this project.
- I had little experience using Raspberry Pis, so even getting SSH and wifi setup was difficult.
- Learning to interface with all the hardware was also a learning curve: installing the necessary RPi accelerator materials, writing code to interface with the SPI display, learning how to use active/passive buzzers, learning how to program the ESP32 chip with the Arduino IDE.
- I also had to use many Python libraries I was unfamiliar with.
My learning involved many strategies. Our approach for how to accelerate the model was to look up similar applications online. For the accelerator materials, I followed the Hackster article. In other applications I followed the given documentation/materials; for example, for the SPI display setup I followed this article https://www.waveshare.com/wiki/2inch_LCD_Module#Python and for the ESP32 wiring I followed the provided pinout from the original Amazon listing. Shaye was also a big help as they were more familiar with many of the hardware components I was working with. I would often first go to them for guidance on what I should generally do. Lastly, ChatGPT was a big help when it came to adjusting to new Python libraries as they could help generate basic code for me to build onto and give me ideas to help debug problems related to the unfamiliar libraries (specifically cv2 and threading specifics).