Rex’s Status Report for 3/8

This week, I spent a lot of time refining the dance coach’s joint rotations and movements in Unity, making sure they feel as natural and responsive as possible which involves using physics/physiology and rotations. One of the focuses this week was adding logic to recolor the avatar’s mesh based on movement accuracy, giving users clear visual feedback on which parts of their body need adjustment. I also worked on integrating the comparison algorithm with Danny and Akul, the algorithms which evaluates the user’s pose against the reference movements. A major challenge was optimizing the frame rate while ensuring that the physics and physiological equations accurately represent real-world motion. It took a lot of trial and error to fine-tune the balance between performance and accuracy, but it’s starting to come together. I collaborated closely to test and debug these changes, ensuring that it works correctly for basic movements.

Overall, I’d say progress is on schedule, but some of the optimization work took longer than expected. The biggest slowdown was making sure the calculations didn’t introduce lag while still maintaining accurate movement tracking. I also believe that there is more improvement to be made on the rotations of some of the joints, especially the neck to model the movement more accurately. To stay on track, I plan to refine the physics model further and improve computational efficiency so the system runs smoothly even with more complex movements. Next week, I hope to finalize the avatar recoloring mechanism, refine movement accuracy detection, and conduct more extensive testing with a wider range of dance poses. The goal is to make the feedback system more intuitive and responsive before moving on to more advanced features.

Attached below are the demo videos for how the dynamic CV to unity avatar is right now, the physics movements will need to be further tweaked for advanced movement  (Note: speeds are not the same for both GIFs) 

Danny’s Status Report for 3/8

This week I focused on integrating my existing work with the 3D Unity framework that Rex has been working on as well as continuing to improve upon the Procrustes Analysis method. The unity visualization allowed me to get a better sense of how the procrustes analysis is working and how I could improve it.

Initially, I was running the normalization algorithm on every single frame and normalizing them to the reference frame. However, this presented a serious problem once we tested the algorithm in Unity. If the test and reference video are not the exact same length in terms of the number of frames, we would be normalizing frames that are not aligned at all. This means that we would have very little temporal distortion tolerance, which negates our premise of doing DTW analysis. It also greatly impacted processing time since a new rotational matrix needed to be computed every single frame.

To improve upon this, I changed the algorithm to calculate procrustes parameters only once based on frame 0, and apply the calculated parameters to each frame afterwards. This solution worked well and greatly improved our processing speed.

 

Reference Footage
Test Footage (Slightly Rotated)
Raw Test Data (Rotated)
Normalized Test Data
Reference

 

Team Status Report for 3/8

Team Status Report

Risk Management:

Risk: Cosine Similarity Algorithm not yielding satisfactory results on complex dances

Mitigation Strategy/Contingency plan: We will continue working on the algorithm to see if there are improvements to be made given how the CV algorithm processes landmark data. If the Cosine Similarity Method will not work properly, we will fall back to a simpler method using Euclidean distance and use that to generate immediate feedback.

Risk: Color based feedback not meeting user expectations

Mitigation Strategy/Contingency plan: We plan to break down our Unity mesh into multiple parts to improve the visual appeal of the feedback coloring, so that users can more immediately understand what they need to do to correct a mistake. We also plan to incorporate a comprehensive user guide to help with the same purpose.

Design Changes:

There were no design changes this week. We have continued to execute our schedule.

Part A:

 

DanCe-V addresses the global need for accessible and affordable dance education, particularly for individuals who lack access to professional dance instructors due to financial, geographic, or logistical constraints. Traditional dance lessons can be expensive and may not be available in rural regions. DanCe-V makes dance training more accessible, as anyone with an internet connection and a basic laptop is able to use our application. Additionally, the system supports self-paced learning, catering to individuals with varying schedules and learning speeds. This is particularly useful in today’s fast-paced world where flexibility in skill learning is becoming more and more important.

 

Furthermore, as the global fitness and wellness industry grows, DanCe-V aligns with the trend of digital fitness solutions that promote physical activity from home. The system also has potential applications in rehabilitation and movement therapy, offering value beyond just dance instruction. By supporting a variety of dance styles, DanCe-V can reach users across different cultures and backgrounds, reinforcing dance as a universal form of expression and exercise.

 

Part B:

One cultural factor to consider is that dance is deeply intertwined with cultural identity and tradition. DanCe-V recognizes the diversity of dance forms worldwide and aims to support various styles, with possibilities of learning classical Indian dance forms, Western ballroom, modern TikTok dances ballroom, traditional folk dances, and more. By allowing users to upload their own reference videos and not just including a constrained set of sample videos, the system ensures that people from different cultural backgrounds can engage with dance forms that are personally meaningful to them. Additionally, DanCe-V respects cultural attitudes toward dance and physical movement. Some cultures may have gender-specific dance norms or modesty considerations, and the system’s at-home training approach allows users to practice comfortably in a private setting.

Part C:

DanCe-V is an eco-friendly alternative to traditional dance education, reducing the need for transportation to dance studios and minimizing associated carbon emissions. By enabling users to practice from home, it decreases reliance on physical infrastructure such as studios, mirrors, and printed materials, contributing to a more sustainable learning model. Additionally, the system operates using a standard laptop webcam, eliminating the need for expensive motion capture hardware, which could involve materials with high environmental costs.

Furthermore, dance is a style of exercise that does not require extra materials, such as weights, treadmills, or sports equipment. By making dance accessible to a larger audience, DanCe-V can help reduce the production of these materials, which often have large, negative impacts on the environment.

Procrustes Analysis Normalization demo:

Before Normalization
After Normalization
Reference
Test Footage
Reference Footage

Cosine Similarity comparison results:

Akul’s Status Report for 3/8

Over the past week, I focused on two main components: the design report and our single frame comparison. For the design report, I spent time developing the quantitative design requirements, the design trades, and the system implementation. With that, I conducted research into requirements for our design, finding specific reasons as to why we want to make any design decisions. For example, to mitigate latency within our system, we decided to choose only a certain subset of points in the MediaPipe output to decrease latency while maintaining accuracy in our points. I decided to go with just 17 points, as many of the points crowd at the user’s head and toes which isn’t necessary for our specific use case. Additionally, we had an idea of how we would implement our system, but I spent time creating block diagrams to put all of our thoughts together for each aspect of the system. Consequently, throughout the rest of the semester, we will have these diagrams to refer to and continue to adapt if we make any changes, so both us and viewers can better understand our system. For the design trade study, I focused on making sure that all of our decisions were fully intentional in terms of the algorithms/libraries/protocols that we were using. I explored tradeoffs between these aspects and provided concrete reasoning as to why we chose one or the other. 

This week, we also made the goal to get a working MVP of the single frame comparison, where we can take a user input and a reference video to see whether or not their dances are similar, when doing a frame-to-frame comparison. We split up the work into normalizing the points, doing the actual comparison given normalized points, and providing the Unity output. My task was to compute whether or not a frame was similar or not based on two provided jsons that represent the user input and the reference input for that frame. 

The overall algorithm that I used was pretty simple. I first created a helper function to find the euclidean distance between two points in space, which will be given in the json inputs. Then, I loop through each of the points in the jsons, computing the distance between each of them. If the distance is less than a certain threshold (.05 for now), then the similarity for that point is true. I do this for each joint that we are computing, then if 80% of the joints are “similar” enough, then the overall output for that frame is true. These metrics I decided on are very arbitrary, and I think that we will first need to integrate the code fully and test these metrics to get a better idea of what we need for our final project.

Our progress is currently on schedule. By the end of spring break, we will have a MVP of our real-time feedback system, and once that is complete we will begin to work on our multi-frame analysis and overall integrating our system together. 

Akul’s Status Report for 2/22

Over the past week, I focused on developing the dance comparison engine. For looking into the DTW algorithm, Danny and I looked into how people have already used DTW to compare human movements, including different variations of the algorithm. For example, some research papers have proposed using extensions of DTW such as generalized time warping, forward plotting DTW, or canonical time warping. Danny found a couple more variations, but one problem we found was not exactly about the algorithm itself but instead the complexity of the algorithms to be run in python code over an entire video. With that, we plan to test multiple of these variations in Python, allowing us to see which ones may be best for our comparison engine while still maintaining efficiency and minimizing latency. 

Another problem I wanted to look at with the comparison engine was how we plan to normalize the points between the reference video and the user inputs. Although we are getting the MediaPipe points in a similar manner for both the reference video and the user inputs, the coordinates of the points will be different, so our challenge lies in normalizing these for accurate comparisons. One thing that we plan to do is use the angles between the joints as part of our comparison, so I developed a python algorithm that can properly measure the angles between two joints based on the dot product and geometry of certain points. I attached some pictures of how it works using my right elbow, but this can be applied to any joint on the body that we would want to measure the angle of. 

Our progress is on schedule. In the next week, I hope to do a few things. First, continue working on the comparison engine and try to get a MVP of the engine so we can iterate and improve on it. Second, I am going to talk to people who know more about dance or who are dancers themselves and get a better understanding of what we could add to the project to make it as useful as possible for the users. 

Danny’s Status Report for 2/22

This past week I was responsible for presenting our project during the Design Review. As a result, I spent most of the time during the first half of the week refining the presentation as well as practicing my delivery. After that, since we are ahead of schedule in terms of the CV system implementation, I focused on doing research into the specific algorithms and optimization methods we can use to construct our 3D comparison engine. Since we want to provide feedback in a timely manner, whether that’s after the entire dance or real-time, computation speed is a big problem for us since our chosen algorithm (DTW) is extremely computationally intensive. Therefore, I spent time looking specifically into optimization methods that include papers written on PrunedDTW, FastDTW, SparseDTW, etc.

 

Illustration of DTW:

Implementation of standard DTW:

Example of SparseDTW:

Al-Naymat, G., Chawla, S., Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping.

Olsen, NL; Markussen, B; Raket, LL (2018), “Simultaneous inference for misaligned multivariate functional data”, Journal of the Royal Statistical Society, Series C, 67 (5): 1147–76, arXiv:1606.03295, doi:10.1111/rssc.12276, S2CID 88515233

Team Status Report for 2/22

Team Status Report

Risk Management:

Risk: While our proposed solution may achieve accurate comparison on a technical level, our feedback system design carries the risk of not being exactly what our targeted users what/need.

Mitigation Strategy/Contingency plan: We plan to reach out to a variety of potential users of this system, including serious dancers, tiktok influencers who record dances regularly, and regular people who may record a casual dance or two once in a while. We will then use the feedback gathered from these potential users to better inform the specific design of how we generate our feedback.

Design Changes:

There were no design changes this week. We have continued to execute our schedule.

Rex’s Status Report for 2/22

This week, I focused on improving the natural movement of the character in our CV-based dance coach. By manually adjusting the targets that each joint should track, including arms, legs, and neck, I was able to refine the character’s motion to make it feel more fluid and realistic. The attached video demonstrates these improvements, showing how the avatar’s motion now closely mirrors user input in a more natural way given 3 different dance test poses to move to/from. Additionally, I worked on optimizing our UDP network communication, ensuring that packet transmission and reception are stable with good throughput. One major challenge I encountered is normalizing the coordinate data from MediaPipe’s OpenCV-based pose tracking to Unity’s avatar system, as they operate on different scales. To better understand how real-time feedback should be structured, I also consulted 3 friends who enjoy TikTok dancing and learning these trendy dances on the fly. Based on their input, I confirmed that a Just Dance-style single-frame feedback system is intuitive, fun, and easy to engage with, making it an ideal approach for our project, so I’ve decided with our teammates that this should be what we look for in our design requirements as well.

Currently, our progress is on track with our proposed schedule, as we are actively working on synchronizing user input with the reference video using CV techniques. We have also taken feedback from our Design Proposal Presentation this week to incorporate a more refined design requirement for the feedback part of our system. Moving forward, my next major focus will be on normalizing the MediaPipe pose estimation coordinates to correctly align with Unity’s character rig. This is a crucial step in ensuring accurate comparisons between the user’s movements and the reference video. Additionally, this will augment our upcoming tasks of making a real-time feedback system. If any delays arise on my part, I will dedicate additional time to debugging and adjusting the coordinate mappings to keep us aligned with our project timeline.

Danny’s Status Report for 2/15

This past week, as outlined on the schedule, I primarily focused on processing reference video inputs with OpenCV. I spent time exploring both MediaPipe and Open Pose as different ways to process and label the reference input video. After spending a substantial amount of time experimenting with both, we as a team decided that MediaPipe was a better fit for our needs. I then proceeded to test the MediaPipe pipeline with video inputs, initially with just a simple recording of myself. This initial test yielded unsatisfactory results, prompting me to continue to fine tune the MediaPipe library and OpenCV capturing.

The MediaPipe library comes with several base models. It also has a variety of options that includes:

  • min_pose_detection_confidence (0.0-1.0):
    • Controls how confident the model needs to be to report a pose detection
    • Higher values reduce false positives but might miss some poses
    • Lower values catch more poses but may include false detections
  • min_pose_presence_confidence (0.0-1.0):
    • Threshold for considering a pose to be present
    • Affects how readily the model reports pose presence
  • min_tracking_confidence (0.0-1.0):
    • For video mode, controls how confident the tracker needs to be to maintain tracking
    • Lower values make tracking more stable but might track incorrect poses
    • Higher values are more precise but might lose tracking more easily
  • num_poses:
    • Maximum number of poses to detect in each frame
    • Increasing this will detect more poses but use more processing power
    • Default is 1
  • Output_segmentation_masks:
    • Boolean to enable/disable segmentation mask output
    • Disabling can improve performance if you don’t need masks

After experimentation, I found that the parameters that affected our detection the most was the min_pose_detection_confidence as well as the min_pose_presence_confidence parameters. After fine tuning these parameters, I was able to achieve much better tracking on not just my own simple testing video, but also a relatively complex YouTube dancing short. As we continue to work on this algorithm and integrating the systems together, I will also continue to experiment with the options to try to optimize the performance while keeping tracking confidence as high as possible.

 

Testing with recorded footage from webcam:

Testing with YouTube shorts dancing video ():

Akul’s Status Report for 2/15

This week I was able to set up the joint tracking from a computer camera. I utilized the Python libraries OpenCV and Mediapipe in order to complete this task. With that, I had to learn key features of the libraries, going through the documentation for each and understanding the syntax. After I had a better understanding of how to use the libraries, I went and coded up an algorithm that will naively display your computer’s web camera and capture the different landmarks and points on the player’s body. Once I had this ready, I wanted to test out the feasibility of our application to be played by a regular user, so I went to multiple different locations and ran the script. With that, I found that it is definitely feasible for the player to play from the comfort of their own camera, but the user needs to stand pretty far from the computer (i.e. the user needs a decent amount of space to actually play the game). This isn’t really a problem but is just something we need to consider. 

After that, I worked with Rex to see how I should send the coordinates of the user to the Unity system to showcase the user’s joints within our game UI. I first had to figure out how to print out the coordinates of the 32 points that Mediapipe provides for us. After I did that, I organized that data into a JSON that Rex is able to actually interpret in his Unity game. We have already tested our capability to send data from a Python script to a Unity server, so now we have the capability to translate a real-life human’s moves to a character in the game.

In terms of our progress, I feel that we are on a good track. We now have a better understanding of what the capabilities are of our project and how we can go about completing it. We ran into some slight high-level setbacks when it came to switching our project from a game to an application for learning, but I believe we are moving in the right direction. 

Next week, I hope to explore further into the algorithm we will use to detect differences between a user’s dance moves and a reference video. At this point, Danny, Rex, and I have worked on different aspects individually, so next week we will focus on incorporating everything we have together and taking the next steps to actually help people learn dance moves from a reference video.