Team Status Report for 4/12

Team Status Report

Risk Management:

Risk: Comparison algorithm slowing down Unity feedback

Mitigation Strategy/Contingency plan: We plan to reduce the amount of computation required by having the DTW algorithm run on a larger buffer. If this does not work, we will fall back to a simpler algorithm selected from the few we are testing now.

Design Changes:

There were no design changes this week. We have continued to execute our schedule.

Verification and Validation:

Verification Testing

Pose Detection Accuracy Testing

  • Completed Tests: We’ve conducted initial verification testing of our MediaPipe implementation by comparing detected landmarks against ground truth positions marked by professional dancers in controlled environments.
  • Planned Tests: We’ll perform additional testing across varied lighting conditions and distances (1.5-3.5m) to verify consistent performance across typical home environments.
  • Analysis Method: Statistical comparison of detected vs. ground truth landmark positions, with calculation of average deviation in centimeters.

Real-Time Processing Performance

  • Completed Tests: We’ve measured frame processing rates in typical hardware configurations (mid range laptop).
  • Planned Tests: Extended duration testing (20+ minute sessions) to verify performance stability and resource utilization over time.
  • Analysis Method: Performance profiling of CPU/RAM usage during extended sessions to ensure extended system stability.

DTW Algorithm Accuracy

  • Completed Tests: Initial testing of our DTW implementation with annotated reference sequences.
  • Planned Tests: Expanded testing with deliberately introduced temporal variations to verify robustness to timing differences.
  • Analysis Method: Comparison of algorithm-identified errors against reference videos, with focus on false positive/negative rates.

Unity Visualization Latency

  • Completed Tests: End-to-end latency measurements from webcam capture to avatar movement display.
  • Planned Tests: Additional testing to verify UDP packet delivery rates.
  • Analysis Method: High-speed video capture of user movements compared with screen recordings of avatar responses, analyzed frame-by-frame.

Validation Testing

Setup and Usability Testing

  • Planned Tests: Expanded testing with 30 additional participants representing our target demographic.
  • Analysis Method: Observation and timing of first-time setup process, followed by survey assessment of perceived difficulty.

Feedback Comprehension Validation

  • Planned Tests: Structured interviews with users after receiving system feedback, assessing their understanding of recommended improvements.
  • Analysis Method: Scoring of users’ ability to correctly identify and implement suggested corrections, with target of 90% comprehension rate.

Rex’s Status Report for 4/12

This week, I began by implementing more key features and refactoring critical components, as a part of the integration phase of our project. I modified our pose receiving to properly handle CombinedData, which now includes both raw poseData and real-time feedback from the dynamic time warping (DTW) algorithm. This integration required careful coordination with the updated pose_sender.py script, where I also addressed performance issues with regards to a laggy webcam input. Specifically, I optimized the DTW algorithm by offloading computations to a separate thread, reducing webcam lag and improving responsiveness. Additionally, I implemented a new character skin feature compatible with Danny’s pose_sender, allowing for a more customized and engaging user experience.

Progress is mostly on schedule for the integration part. I plan to spend additional hours refining the feedback visualization and testing latency under different system loads. In the coming week, my goal is to complete the UX feature that highlights which body parts are incorrectly matched in real-time during a dance session. This will significantly enhance usability and user learning by making corrections more intuitive and immediate for the final demo as well.

Now that core modules are functioning, I’ve begun transitioning into the verification and validation phase. Planned tests include unit testing each communication component (pose sender and receivers), integration testing across the DTW thread optimization, and utilizing several short dances for testing accuracy of the real-time feedback. To verify design effectiveness, I will analyze frame-by-frame comparisons of live poses against reference poses as well as the DTW algorithm’s window. This would allow me to check timing accuracy, body part correlation, and response latency using python timers in the code; seeing that they adhere to what we outlined in the use-case requirements with regards to timing metrics. I also plan to evaluate user interaction with the feedback system via usability testing in order to see how viable the final demo can be.

Team Status Report for 3/29

Risk Management:

Risk: Comparison algorithm not being able to handle depth data

Mitigation Strategy/Contingency plan: We plan to normalize the test and reference videos so that they both represent absolute coordinates, allowing us to use euclidean distance for our comparison algorithms. If this does not work, we can fall back to neglecting the relative and unreliable depth data from the CV and rely purely on the xy coordinates, which should still provide good quality feedback for casual dancers.

Risk: Comparison algorithm not matching up frame by frame – continued risk

Mitigation Strategy/Contingency plan: We will attempt to implement a change to our algorithm that takes into account a constant delay between the user and the reference video. If correctly implemented, this will allow the user to not start precisely at the same time as the reference video and still receive accurate feedback. If we are unable to implement this solution, we will incorporate warnings and counters to make sure the users know when to correctly start dancing so that their footage is matched up with the reference video

Design Changes:

There were no design changes this week. We have continued to execute our schedule.

Danny’s Status Report for 3/29

This week I focused on addressing the issues brought up at our most recent update meeting, which is the sophistication of our comparison algorithm. We ultimately decided that we would explore multiple ways to do time series comparisons in real time, and that I would explore a fastDTW implementation in particular.

Implementing the actual algorithm and adapting it to real time proved difficult at first, since DTW was originally used for analysis of complete sequences. However, after some research and experimentation, I realized that we could adapt a sliding window approach to implementing DTW. This meant that I would store a certain number of real time frames in a buffer and try to map that as a sequence onto the reference video.

Then, since our feedback system in Unity has not been fully implemented yet, I chose to apply some feedback metrics to the computer vision frames, which allow us to easily digest the results from the algorithm and try to optimize it further.

Example of feedback overlaid on a CV frame:

Akul’s Status Report for 3/29

This week, I worked on getting us set up for the Interim Demo. After meeting with the professor on Monday, we explored how to improve our comparison algorithm. Before, we mostly just had a frame-to-frame comparison which had okay accuracy. With that, we explored how to use DTW, not just for post-processing, but also for real-time feedback. I first started by doing some more research into how other people have used DTW for video processing. I read a few papers on how others used DTW for feedback, and I was able to gain a better understanding of how the algorithm works and why it is suitable for our application.

We incorporated DTW by comparing shorter segments of the input video. The biggest pivot we had with this compared to what we originally planned to do was using DTW for the real-time feedback. We did this by comparing specific segments of the video at a time with DTW, rather than using the entire video. We did this because of the time-complexity of DTW – the longer the segment we choose (our original plan was to make the segment the whole video), the longer it will take, as it has a quadratic time complexity. In this case, we were able to segment the video into smaller chunks, allowing us to use DTW for real-time feedback. 

Additionally, I worked on getting test data and planning how our actual interim demo will go. I considered the use-case application of our system, looking at actual dances we would want to replicate. One thing that I found that I personally enjoyed was learning how to do Fortnite dances, which are short and simple dances that can be generally difficult to master. We also played around with uploading these videos to our pipelined system, allowing us to test with other inputs. 

Our progress is on schedule. We have two main components: the Unity side that displays both the reference video and the user video human figures to showcase in real-time how the user dances, and the comparison algorithm that actually showcases what parts of your dance moves correspond to a video, providing if you are dancing well or not. Next steps include integrating both of these aspects together for our total final demo in the next few weeks.

Rex’s Status Report for 3/29

This week, I focused on optimizing the two avatars for the UI in our Unity. Specifically, I implemented a parallel processing approach where one avatar receives pose information (in a json form) from a pre-recorded reference video, while the other avatar receives pose data from live capture, parsed through JSON files. Ensuring that these two avatars execute smoothly and simultaneously, this allows us to effectively compare live performances against reference poses in real-time for the user to see what moves they should be doing. The UI also now shows what the CV is actually trying to capture as well. Additionally, I collaborated with my group members to test various comparison algorithms for evaluating the similarity between the reference and live poses. After thorough experimentation, we made a lot of progress with Dynamic Time Warping (DTW) due to its ability to handle temporal variations effectively, and we feel like this resolves the problem regarding the frame by frame misalignment when we do comparison. So, we integrated DTW into our existing ML pipeline, ensuring compatibility with the data structures we are working with. Screenshots of the UI with the two avatars running in parallel, and CV output is shown below.

Left avatar is reference video avatar, right avatar is live input.

Progress on the project is generally on schedule. While optimizing the avatar processing in parallel took slightly longer than anticipated due to synchronization challenges, the integration of the DTW algorithm proceeded decently once we established the data pipeline. If necessary, I will allocate additional hours next week to refine the comparison algorithm and improve the UI feedback for the player.

Next week, I plan on enhancing the feedback for the player. This will involve enhancing the UI to provide more intuitive feedback for the user when their pose deviates from the desired reference pose. Additionally, I aim to fine-tune the DTW implementation to improve accuracy and responsiveness. By the end of the week, the goal is to have a fully functional feedback system that clearly indicates which body parts need adjustment.

Akul’s Status Report for 3/22

This week I focused on improving our comparison algorithm logic and exploring the dynamic-time warping post processing algorithm. In regards to the frame-by-frame comparison algorithm, last week, I made an algorithm that takes in two videos and outputs if the dance moves were similar or not. However, the actual comparison was giving too many false positives. I worked on debugging this with Danny to see what some of the problems were with this, and I found that some of the thresholds were too high in the comparison logic. After tweaking these and spending time testing these with other video points, the comparisons got better, but they aren’t 100% accurate. 

With that, I decided to begin working on the dynamic-time warping algorithm to get a sense of what we could do to improve our overall performance and feedback to the user. I spent some time thinking about how we would implement the dynamic-time warping algorithm and also how we would use this to actually provide useful feedback for the user. I broke it down to measure similarity but also highlight specific areas for improvement, such as timing, posture, or limb positioning using specific points in the mediapipe dataset. I began implementation, but am currently running into some bugs that I will fix next week. 

I also worked with Rex to begin incorporating the comparison logic to the Unity game. We met to catch each other up on our progress and to plan how we will integrate our parts. There were some things that we needed to modify such as the JSON formatting to make sure everything would be okay compatibility wise. For next week, one goal we definitely have is to incorporate our codebases more fully so we can have a successful interim demo the week after.

Danny’s Status Report for 3/22

This week I was deeply involved in collaborative efforts with Rex and Akul to enhance and streamline our real-time rendering and feedback system. Our primary goal was to integrate various components smoothly, but we encountered several significant challenges along the way.

As we attempted to incorporate Akul’s comparison algorithm with the Procrustes analysis into Rex’s real-time pipeline, we discovered multiple compatibility issues. The most pressing problem involved inconsistent JSON formatting across our different modules, which prevented seamless data exchange and processing. These inconsistencies were causing failures at critical integration points and slowing down our development progress.

To address these issues, I developed a comprehensive Python reader class that standardizes how we access and interpret 3D landmark data. This new utility provides a consistent interface for extracting, parsing, and manipulating the spatial data that flows through our various subsystems. The reader class abstracts away the underlying format complexities, offering simple, intuitive methods that all team members can use regardless of which module they’re working on.

This standardization effort has significantly improved our cross-module compatibility, making it much easier for our individual components to communicate effectively. The shared data access pattern has eliminated many of the integration errors we were experiencing and reduced the time spent debugging format-related issues.

Additionally, I worked closely with Akul to troubleshoot various problems he encountered while trying to adapt his comparison algorithm for real-time operation. This involved identifying bottlenecks in the video processing pipeline, diagnosing frame synchronization issues, and helping optimize certain computational steps to maintain acceptable performance under real-time constraints.

By the end of the week, we made substantial progress toward a more unified system architecture with better interoperability between our specialized components. The standardized data access approach has set us up for more efficient collaboration and faster integration of future features.

Team Status Report for 3/22

Risk Management:

Risk: Dynamic Input Integration into Unity Pipeline

Mitigation Strategy/Contingency Plan:
A newly identified risk is the uncertainty regarding how to efficiently store and process dynamic user inputs within our current UI/UX pipeline, particularly in the context of real-time performance comparison. To address this, we will undertake detailed research into Unity’s documentation and forums. Our contingency plan includes setting aside additional team time for prototype development and targeted debugging sessions, ensuring timely resolution without affecting our overall timeline.

Risk: Comparison Algorithm Synchronization Issues

Mitigation Strategy/Contingency Plan:
We continue to face potential challenges in ensuring our comparison algorithm aligns the user’s performance frame-by-frame with the reference video. To mitigate this, we’re refining the algorithm to better accommodate constant timing offsets, allowing flexibility if the user doesn’t start exactly in sync with the reference video. If this proves insufficient, we will implement clear UI warnings and countdown mechanisms to ensure proper synchronization at the start of each session.

Risk: Visual Feedback Clarity and Usability

Mitigation Strategy/Contingency Plan:
Our original plan to split the Unity mesh into multiple segments for improved visual feedback has encountered increasing complexity. As mesh segmentation proves more cumbersome than initially expected, we’re now considering the implementation of custom Unity shaders to dynamically color individual meshes. Alternatively, we may explore overlaying precise visual indicators directly onto the user’s dance pose to clearly highlight necessary corrections, ensuring usability and meeting user expectations.

Design Changes:

No substantial design changes have occurred this week. Our current implementation aligns closely with our established schedule and original design specifications. PLEASE REFER TO INDIVIDUAL STATUS REPORTS FOR SPECIFIC UPDATES/PHOTOS. However, as noted in the risks above, we are preparing for potential minor adjustments, particularly concerning visual feedback/experience and Unity integration processes.

Rex’s Status Report for 3/22

This week, I improved the game’s UI/UX pipeline to facilitate smooth selection of reference .mp4 videos. Although initial implementation was partially completed last week, several bugs affecting the UI/UX integration were identified and resolved this week. Users can now intuitively pick and load a video directly from the game interface, simplifying the setup process and enhancing the overall user experience. Furthermore, the video analysis module was extended to handle selected reference videos robustly, effectively translating video movements into coordinates used by the avatar. This enhancement enables accurate real-time performance comparison, seamlessly integrating both live capture and pre-recorded video data.

This is Danny in a Pre-recorded .mp4 (reference mp4) – NOT live capture Additionally, I successfully optimized the avatar‘s leg recreation for our Unity-based OpenCV MediaPipe dance comparison game. Previously, the avatar’s leg movements experienced slight jitter and occasional lagging frames, making the visual representation less smooth as I mentioned in last week’s report. By refining the landmark smoothing algorithm and employing interpolation techniques between key frames, the avatar’s leg animations now  follow the user’s movements better, significantly enhancing overall realism and responsiveness. As a result, the visual feedback loop maintains an ideal frame rate, consistently hovering around 30 fps, matching our outlined design goals.

Currently, our progress aligns well with our original timeline. Next week, I plan to focus on optimizing and integrating the comparison algorithm further alongside Danny and Akul. Our goal is to implement more sophisticated analytical metrics to assess player accuracy comprehensively. Deliverables targeted for completion include a refined comparison algorithm fully integrated into our Unity game pipeline, rigorous testing, and initial documentation outlining the improved analytic metrics.