Akul’s Status Report for 4/12

This week I worked on developing multiple unique comparison algorithms, with the goal of iterating upon our existing comparison algorithm, trying new features, and being intentional about the design decisions behind the algorithm. For the interim demo, we already had an algorithm that utilizes dynamic time warping, analyzes 30 frames every 0.1 seconds, and computes similarity based on each joint individually. This week I focused on making 5 unique comparison algorithms, allowing me to compare the accuracy of how different parameters and features of the algorithm can improve the effectiveness of our dance coach. The following are the 5 variations I created, and I will compare these 5 with our original to help find an optimal solution:

  1. Frame-to-frame comparisons: does not use dynamic-time warping, simply creates a normalization matrix between the reference video and user webcam input and compares the coordinates on each frame.
  2. Dynamic-time-warping, weighted similarity calculations: builds upon our algorithm for the interim demo to calculate the similarity between different joints to relatively weigh more than other joints.
  3. Dynamic-time-warping, increasing analysis window/frame buffer: builds upon our algorithm for the interim demo to increase the analysis window and frame buffer to get a  more accurate DTW analysis.
  4. Velocity-based comparisons: similar to the frame-to-frame comparisons, but computes the velocity of joints over time as they move, and compares those velocities to the reference video velocities in order to detect not exactly where the joints are, but how the joints move over time.
  5. Velocity-based comparisons with frame-to-frame comparisons: iterates upon the velocity comparisons to utilize both the velocity comparisons and the frame-to-frame joint position comparisons to see if that would provide an accurate measurement of comparison between the reference and user input video.

I have implemented and debugged these algorithms above, but starting tomorrow and continuing throughout the week, I will conduct quantitative and qualitative comparisons between these algorithms to see which is best for our use case and find further points to improve. Additionally, I will communicate with Rex and Danny to see how I can make it as easy as possible to integrate the comparison algorithm with the Unity side portion of the game. Overall, our progress seems to be on schedule; if I can get the comparison algorithm finalized within the next week and we begin integration in the meantime, we will be on a good track to be finished by the final demo and deadline.

There are two main parts that I will need to test and verify for the comparison algorithm. First, I aim to test the real-time processing performance of each of these algorithms. For example, the DTW algorithm with the extended search video may require too much computation power to allow for real-time comparisons. On the other hand, the velocity/frame-to-frame comparison algorithms may have space to add more complexity in order to improve the accuracy of the comparison without resulting in problems with the processing performance. 

Second, I am to test the accuracy of each of these comparison algorithms. For each of the algorithms described above, I will run the algorithm on a complex video (such as a TikTok dance), a simpler video (such as a video of myself dancing to a slower dance), and a still video (such as me doing a T-pose in the camera). With this, I will record the output after I actively do the dance, allowing me to watch the video back and see how each algorithm does. After, I will create a table that allows me to note both quantitative and qualitative notes I have on each algorithm, seeing what parts of the algorithm are lacking and performing well. This will allow me to have all the data I need in front of me when deciding what I should do to continue iterating upon the algorithm.

With these two strategies, I believe that we will be on a good track to verify the effectiveness of our dance coach and create the best possible comparison algorithm we can to help our users.

Danny’s Status Report for 4/12

This week I focused on integrating our comparison algorithm with the Unity interface, collaborating closely with Rex and Akul. We established a robust UDP communication protocol between the Python-based analysis module and Unity. We encountered initial synchronization issues where the avatars would occasionally freeze or jump, which we traced to packet loss during high CPU utilization. We implemented a heartbeat mechanism and frame sequence numbering that improved stability significantly.

We then collaborated on mapping the comparison results to the Unity visualization. We developed a color gradient system that highlights body segments based on deviation severity. During our testing, we identified that hip and shoulder rotations were producing too many false positives in the error detection. We then tuned the algorithm’s weighting factors to prioritize key movement characteristics based on dance style, which improved the relevance of the feedback.

As for the verification and validation portion, I am in charge of the CV subsystem of our project. For this subsystem specifically, my plans are as follows:

Pose Detection Accuracy Testing

  • Completed Tests: We’ve conducted initial verification testing of our MediaPipe implementation by comparing detected landmarks against ground truth positions marked by professional dancers in controlled environments.
  • Planned Tests: We’ll perform additional testing across varied lighting conditions and distances (1.5-3.5m) to verify consistent performance across typical home environments.
  • Analysis Method: Statistical comparison of detected vs. ground truth landmark positions, with calculation of average deviation in centimeters.

Real-Time Processing Performance

  • Completed Tests: We’ve measured frame processing rates in typical hardware configurations (mid range laptop).
  • Planned Tests: Extended duration testing (20+ minute sessions) to verify performance stability and resource utilization over time.
  • Analysis Method: Performance profiling of CPU/RAM usage during extended sessions to ensure extended system stability.

Team Status Report for 4/12

Team Status Report

Risk Management:

Risk: Comparison algorithm slowing down Unity feedback

Mitigation Strategy/Contingency plan: We plan to reduce the amount of computation required by having the DTW algorithm run on a larger buffer. If this does not work, we will fall back to a simpler algorithm selected from the few we are testing now.

Design Changes:

There were no design changes this week. We have continued to execute our schedule.

Verification and Validation:

Verification Testing

Pose Detection Accuracy Testing

  • Completed Tests: We’ve conducted initial verification testing of our MediaPipe implementation by comparing detected landmarks against ground truth positions marked by professional dancers in controlled environments.
  • Planned Tests: We’ll perform additional testing across varied lighting conditions and distances (1.5-3.5m) to verify consistent performance across typical home environments.
  • Analysis Method: Statistical comparison of detected vs. ground truth landmark positions, with calculation of average deviation in centimeters.

Real-Time Processing Performance

  • Completed Tests: We’ve measured frame processing rates in typical hardware configurations (mid range laptop).
  • Planned Tests: Extended duration testing (20+ minute sessions) to verify performance stability and resource utilization over time.
  • Analysis Method: Performance profiling of CPU/RAM usage during extended sessions to ensure extended system stability.

DTW Algorithm Accuracy

  • Completed Tests: Initial testing of our DTW implementation with annotated reference sequences.
  • Planned Tests: Expanded testing with deliberately introduced temporal variations to verify robustness to timing differences.
  • Analysis Method: Comparison of algorithm-identified errors against reference videos, with focus on false positive/negative rates.

Unity Visualization Latency

  • Completed Tests: End-to-end latency measurements from webcam capture to avatar movement display.
  • Planned Tests: Additional testing to verify UDP packet delivery rates.
  • Analysis Method: High-speed video capture of user movements compared with screen recordings of avatar responses, analyzed frame-by-frame.

Validation Testing

Setup and Usability Testing

  • Planned Tests: Expanded testing with 30 additional participants representing our target demographic.
  • Analysis Method: Observation and timing of first-time setup process, followed by survey assessment of perceived difficulty.

Feedback Comprehension Validation

  • Planned Tests: Structured interviews with users after receiving system feedback, assessing their understanding of recommended improvements.
  • Analysis Method: Scoring of users’ ability to correctly identify and implement suggested corrections, with target of 90% comprehension rate.

Rex’s Status Report for 4/12

This week, I began by implementing more key features and refactoring critical components, as a part of the integration phase of our project. I modified our pose receiving to properly handle CombinedData, which now includes both raw poseData and real-time feedback from the dynamic time warping (DTW) algorithm. This integration required careful coordination with the updated pose_sender.py script, where I also addressed performance issues with regards to a laggy webcam input. Specifically, I optimized the DTW algorithm by offloading computations to a separate thread, reducing webcam lag and improving responsiveness. Additionally, I implemented a new character skin feature compatible with Danny’s pose_sender, allowing for a more customized and engaging user experience.

Progress is mostly on schedule for the integration part. I plan to spend additional hours refining the feedback visualization and testing latency under different system loads. In the coming week, my goal is to complete the UX feature that highlights which body parts are incorrectly matched in real-time during a dance session. This will significantly enhance usability and user learning by making corrections more intuitive and immediate for the final demo as well.

Now that core modules are functioning, I’ve begun transitioning into the verification and validation phase. Planned tests include unit testing each communication component (pose sender and receivers), integration testing across the DTW thread optimization, and utilizing several short dances for testing accuracy of the real-time feedback. To verify design effectiveness, I will analyze frame-by-frame comparisons of live poses against reference poses as well as the DTW algorithm’s window. This would allow me to check timing accuracy, body part correlation, and response latency using python timers in the code; seeing that they adhere to what we outlined in the use-case requirements with regards to timing metrics. I also plan to evaluate user interaction with the feedback system via usability testing in order to see how viable the final demo can be.

Team Status Report for 3/29

Risk Management:

Risk: Comparison algorithm not being able to handle depth data

Mitigation Strategy/Contingency plan: We plan to normalize the test and reference videos so that they both represent absolute coordinates, allowing us to use euclidean distance for our comparison algorithms. If this does not work, we can fall back to neglecting the relative and unreliable depth data from the CV and rely purely on the xy coordinates, which should still provide good quality feedback for casual dancers.

Risk: Comparison algorithm not matching up frame by frame – continued risk

Mitigation Strategy/Contingency plan: We will attempt to implement a change to our algorithm that takes into account a constant delay between the user and the reference video. If correctly implemented, this will allow the user to not start precisely at the same time as the reference video and still receive accurate feedback. If we are unable to implement this solution, we will incorporate warnings and counters to make sure the users know when to correctly start dancing so that their footage is matched up with the reference video

Design Changes:

There were no design changes this week. We have continued to execute our schedule.

Danny’s Status Report for 3/29

This week I focused on addressing the issues brought up at our most recent update meeting, which is the sophistication of our comparison algorithm. We ultimately decided that we would explore multiple ways to do time series comparisons in real time, and that I would explore a fastDTW implementation in particular.

Implementing the actual algorithm and adapting it to real time proved difficult at first, since DTW was originally used for analysis of complete sequences. However, after some research and experimentation, I realized that we could adapt a sliding window approach to implementing DTW. This meant that I would store a certain number of real time frames in a buffer and try to map that as a sequence onto the reference video.

Then, since our feedback system in Unity has not been fully implemented yet, I chose to apply some feedback metrics to the computer vision frames, which allow us to easily digest the results from the algorithm and try to optimize it further.

Example of feedback overlaid on a CV frame:

Akul’s Status Report for 3/29

This week, I worked on getting us set up for the Interim Demo. After meeting with the professor on Monday, we explored how to improve our comparison algorithm. Before, we mostly just had a frame-to-frame comparison which had okay accuracy. With that, we explored how to use DTW, not just for post-processing, but also for real-time feedback. I first started by doing some more research into how other people have used DTW for video processing. I read a few papers on how others used DTW for feedback, and I was able to gain a better understanding of how the algorithm works and why it is suitable for our application.

We incorporated DTW by comparing shorter segments of the input video. The biggest pivot we had with this compared to what we originally planned to do was using DTW for the real-time feedback. We did this by comparing specific segments of the video at a time with DTW, rather than using the entire video. We did this because of the time-complexity of DTW – the longer the segment we choose (our original plan was to make the segment the whole video), the longer it will take, as it has a quadratic time complexity. In this case, we were able to segment the video into smaller chunks, allowing us to use DTW for real-time feedback. 

Additionally, I worked on getting test data and planning how our actual interim demo will go. I considered the use-case application of our system, looking at actual dances we would want to replicate. One thing that I found that I personally enjoyed was learning how to do Fortnite dances, which are short and simple dances that can be generally difficult to master. We also played around with uploading these videos to our pipelined system, allowing us to test with other inputs. 

Our progress is on schedule. We have two main components: the Unity side that displays both the reference video and the user video human figures to showcase in real-time how the user dances, and the comparison algorithm that actually showcases what parts of your dance moves correspond to a video, providing if you are dancing well or not. Next steps include integrating both of these aspects together for our total final demo in the next few weeks.

Rex’s Status Report for 3/29

This week, I focused on optimizing the two avatars for the UI in our Unity. Specifically, I implemented a parallel processing approach where one avatar receives pose information (in a json form) from a pre-recorded reference video, while the other avatar receives pose data from live capture, parsed through JSON files. Ensuring that these two avatars execute smoothly and simultaneously, this allows us to effectively compare live performances against reference poses in real-time for the user to see what moves they should be doing. The UI also now shows what the CV is actually trying to capture as well. Additionally, I collaborated with my group members to test various comparison algorithms for evaluating the similarity between the reference and live poses. After thorough experimentation, we made a lot of progress with Dynamic Time Warping (DTW) due to its ability to handle temporal variations effectively, and we feel like this resolves the problem regarding the frame by frame misalignment when we do comparison. So, we integrated DTW into our existing ML pipeline, ensuring compatibility with the data structures we are working with. Screenshots of the UI with the two avatars running in parallel, and CV output is shown below.

Left avatar is reference video avatar, right avatar is live input.

Progress on the project is generally on schedule. While optimizing the avatar processing in parallel took slightly longer than anticipated due to synchronization challenges, the integration of the DTW algorithm proceeded decently once we established the data pipeline. If necessary, I will allocate additional hours next week to refine the comparison algorithm and improve the UI feedback for the player.

Next week, I plan on enhancing the feedback for the player. This will involve enhancing the UI to provide more intuitive feedback for the user when their pose deviates from the desired reference pose. Additionally, I aim to fine-tune the DTW implementation to improve accuracy and responsiveness. By the end of the week, the goal is to have a fully functional feedback system that clearly indicates which body parts need adjustment.

Akul’s Status Report for 3/22

This week I focused on improving our comparison algorithm logic and exploring the dynamic-time warping post processing algorithm. In regards to the frame-by-frame comparison algorithm, last week, I made an algorithm that takes in two videos and outputs if the dance moves were similar or not. However, the actual comparison was giving too many false positives. I worked on debugging this with Danny to see what some of the problems were with this, and I found that some of the thresholds were too high in the comparison logic. After tweaking these and spending time testing these with other video points, the comparisons got better, but they aren’t 100% accurate. 

With that, I decided to begin working on the dynamic-time warping algorithm to get a sense of what we could do to improve our overall performance and feedback to the user. I spent some time thinking about how we would implement the dynamic-time warping algorithm and also how we would use this to actually provide useful feedback for the user. I broke it down to measure similarity but also highlight specific areas for improvement, such as timing, posture, or limb positioning using specific points in the mediapipe dataset. I began implementation, but am currently running into some bugs that I will fix next week. 

I also worked with Rex to begin incorporating the comparison logic to the Unity game. We met to catch each other up on our progress and to plan how we will integrate our parts. There were some things that we needed to modify such as the JSON formatting to make sure everything would be okay compatibility wise. For next week, one goal we definitely have is to incorporate our codebases more fully so we can have a successful interim demo the week after.

Danny’s Status Report for 3/22

This week I was deeply involved in collaborative efforts with Rex and Akul to enhance and streamline our real-time rendering and feedback system. Our primary goal was to integrate various components smoothly, but we encountered several significant challenges along the way.

As we attempted to incorporate Akul’s comparison algorithm with the Procrustes analysis into Rex’s real-time pipeline, we discovered multiple compatibility issues. The most pressing problem involved inconsistent JSON formatting across our different modules, which prevented seamless data exchange and processing. These inconsistencies were causing failures at critical integration points and slowing down our development progress.

To address these issues, I developed a comprehensive Python reader class that standardizes how we access and interpret 3D landmark data. This new utility provides a consistent interface for extracting, parsing, and manipulating the spatial data that flows through our various subsystems. The reader class abstracts away the underlying format complexities, offering simple, intuitive methods that all team members can use regardless of which module they’re working on.

This standardization effort has significantly improved our cross-module compatibility, making it much easier for our individual components to communicate effectively. The shared data access pattern has eliminated many of the integration errors we were experiencing and reduced the time spent debugging format-related issues.

Additionally, I worked closely with Akul to troubleshoot various problems he encountered while trying to adapt his comparison algorithm for real-time operation. This involved identifying bottlenecks in the video processing pipeline, diagnosing frame synchronization issues, and helping optimize certain computational steps to maintain acceptable performance under real-time constraints.

By the end of the week, we made substantial progress toward a more unified system architecture with better interoperability between our specialized components. The standardized data access approach has set us up for more efficient collaboration and faster integration of future features.