This week I focused on integrating my existing work with the 3D Unity framework that Rex has been working on as well as continuing to improve upon the Procrustes Analysis method. The unity visualization allowed me to get a better sense of how the procrustes analysis is working and how I could improve it.
Initially, I was running the normalization algorithm on every single frame and normalizing them to the reference frame. However, this presented a serious problem once we tested the algorithm in Unity. If the test and reference video are not the exact same length in terms of the number of frames, we would be normalizing frames that are not aligned at all. This means that we would have very little temporal distortion tolerance, which negates our premise of doing DTW analysis. It also greatly impacted processing time since a new rotational matrix needed to be computed every single frame.
To improve upon this, I changed the algorithm to calculate procrustes parameters only once based on frame 0, and apply the calculated parameters to each frame afterwards. This solution worked well and greatly improved our processing speed.
Reference FootageTest Footage (Slightly Rotated)Raw Test Data (Rotated)Normalized Test DataReference
Risk: Cosine Similarity Algorithm not yielding satisfactory results on complex dances
Mitigation Strategy/Contingency plan: We will continue working on the algorithm to see if there are improvements to be made given how the CV algorithm processes landmark data. If the Cosine Similarity Method will not work properly, we will fall back to a simpler method using Euclidean distance and use that to generate immediate feedback.
Risk: Color based feedback not meeting user expectations
Mitigation Strategy/Contingency plan: We plan to break down our Unity mesh into multiple parts to improve the visual appeal of the feedback coloring, so that users can more immediately understand what they need to do to correct a mistake. We also plan to incorporate a comprehensive user guide to help with the same purpose.
Design Changes:
There were no design changes this week. We have continued to execute our schedule.
Part A:
DanCe-V addresses the global need for accessible and affordable dance education, particularly for individuals who lack access to professional dance instructors due to financial, geographic, or logistical constraints. Traditional dance lessons can be expensive and may not be available in rural regions. DanCe-V makes dance training more accessible, as anyone with an internet connection and a basic laptop is able to use our application. Additionally, the system supports self-paced learning, catering to individuals with varying schedules and learning speeds. This is particularly useful in today’s fast-paced world where flexibility in skill learning is becoming more and more important.
Furthermore, as the global fitness and wellness industry grows, DanCe-V aligns with the trend of digital fitness solutions that promote physical activity from home. The system also has potential applications in rehabilitation and movement therapy, offering value beyond just dance instruction. By supporting a variety of dance styles, DanCe-V can reach users across different cultures and backgrounds, reinforcing dance as a universal form of expression and exercise.
Part B:
One cultural factor to consider is that dance is deeply intertwined with cultural identity and tradition. DanCe-V recognizes the diversity of dance forms worldwide and aims to support various styles, with possibilities of learning classical Indian dance forms, Western ballroom, modern TikTok dances ballroom, traditional folk dances, and more. By allowing users to upload their own reference videos and not just including a constrained set of sample videos, the system ensures that people from different cultural backgrounds can engage with dance forms that are personally meaningful to them. Additionally, DanCe-V respects cultural attitudes toward dance and physical movement. Some cultures may have gender-specific dance norms or modesty considerations, and the system’s at-home training approach allows users to practice comfortably in a private setting.
Part C:
DanCe-V is an eco-friendly alternative to traditional dance education, reducing the need for transportation to dance studios and minimizing associated carbon emissions. By enabling users to practice from home, it decreases reliance on physical infrastructure such as studios, mirrors, and printed materials, contributing to a more sustainable learning model. Additionally, the system operates using a standard laptop webcam, eliminating the need for expensive motion capture hardware, which could involve materials with high environmental costs.
Furthermore, dance is a style of exercise that does not require extra materials, such as weights, treadmills, or sports equipment. By making dance accessible to a larger audience, DanCe-V can help reduce the production of these materials, which often have large, negative impacts on the environment.
Procrustes Analysis Normalization demo:
Before NormalizationAfter NormalizationReferenceTest FootageReference Footage
This past week I was responsible for presenting our project during the Design Review. As a result, I spent most of the time during the first half of the week refining the presentation as well as practicing my delivery. After that, since we are ahead of schedule in terms of the CV system implementation, I focused on doing research into the specific algorithms and optimization methods we can use to construct our 3D comparison engine. Since we want to provide feedback in a timely manner, whether that’s after the entire dance or real-time, computation speed is a big problem for us since our chosen algorithm (DTW) is extremely computationally intensive. Therefore, I spent time looking specifically into optimization methods that include papers written on PrunedDTW, FastDTW, SparseDTW, etc.
Risk: While our proposed solution may achieve accurate comparison on a technical level, our feedback system design carries the risk of not being exactly what our targeted users what/need.
Mitigation Strategy/Contingency plan: We plan to reach out to a variety of potential users of this system, including serious dancers, tiktok influencers who record dances regularly, and regular people who may record a casual dance or two once in a while. We will then use the feedback gathered from these potential users to better inform the specific design of how we generate our feedback.
Design Changes:
There were no design changes this week. We have continued to execute our schedule.
This past week, as outlined on the schedule, I primarily focused on processing reference video inputs with OpenCV. I spent time exploring both MediaPipe and Open Pose as different ways to process and label the reference input video. After spending a substantial amount of time experimenting with both, we as a team decided that MediaPipe was a better fit for our needs. I then proceeded to test the MediaPipe pipeline with video inputs, initially with just a simple recording of myself. This initial test yielded unsatisfactory results, prompting me to continue to fine tune the MediaPipe library and OpenCV capturing.
The MediaPipe library comes with several base models. It also has a variety of options that includes:
min_pose_detection_confidence (0.0-1.0):
Controls how confident the model needs to be to report a pose detection
Higher values reduce false positives but might miss some poses
Lower values catch more poses but may include false detections
min_pose_presence_confidence (0.0-1.0):
Threshold for considering a pose to be present
Affects how readily the model reports pose presence
min_tracking_confidence (0.0-1.0):
For video mode, controls how confident the tracker needs to be to maintain tracking
Lower values make tracking more stable but might track incorrect poses
Higher values are more precise but might lose tracking more easily
num_poses:
Maximum number of poses to detect in each frame
Increasing this will detect more poses but use more processing power
Default is 1
Output_segmentation_masks:
Boolean to enable/disable segmentation mask output
Disabling can improve performance if you don’t need masks
After experimentation, I found that the parameters that affected our detection the most was the min_pose_detection_confidence as well as the min_pose_presence_confidence parameters. After fine tuning these parameters, I was able to achieve much better tracking on not just my own simple testing video, but also a relatively complex YouTube dancing short. As we continue to work on this algorithm and integrating the systems together, I will also continue to experiment with the options to try to optimize the performance while keeping tracking confidence as high as possible.
Testing with recorded footage from webcam:
Testing with YouTube shorts dancing video (@kaileiadixonofficial):
Risk: Losing movement details in the transition from MediaPipe to the Unity inputs. This is something we are noticing after running some initial experiments this week in trying to push simple movements through MediaPipe into Unity.
Mitigation Strategy/Contingency plan: Unity has different kinds of joint options (either choice of Two-Bone Inverse Kinematic Constraint/Multi-Aim Constraint/Damped Transform/Rotation Constraint), so testing between these 4 types of joint options and finding what looks the most natural, and is most coherent with our MediaPipe data.
Design Changes:
Specific Design Updates:
Change: Selecting MediaPipe as our library of choice as opposed to Open Pose
Why: More detailed documentation, ease of use, better match with the amount of details we require
Change: 3D Comparative Analysis Engine to be done in Unity
Why: Unity’s detailed avatar rigging allows us to display the dance moves with accuracy and compare the webcam footage with the reference video with sufficient detail
Cost Impact and Mitigation:
– No direct costs incurred these changes were a part of the planned exploratory stage in our schedule
Updated Schedule:
Part A was written by Danny Cui, Part B was written by Rex Kim, Part C was written by Akul Singh
It is possible, though probably unusual that the answer to a particular question would be “does not apply.” In such a case, please describe what you have considered to ensure that it does not apply.
Please write a paragraph or two describing how the product solution you are designing will meet a specified need…
Part A: … with respect to considerations of public health, safety or welfare. Note: The term ‘health’ refers to a state of well-being of people in both a physiological and psychological sense. ‘Safety’ is the absence of hazards and/or physical harm to persons. The term ‘welfare’ relates to the provision of the basic needs of people.
From a physical health perspective, the system promotes regular exercise through dance, which improves cardiovascular fitness, flexibility, coordination, and muscle strength. The feedback mechanism ensures users maintain proper form and technique, reducing the risk of dance related injuries that could occur from incorrect movements or posture. This is particularly valuable for individuals who may not have access to in person dance instruction or cannot afford regular dance classes.
From a psychological health and welfare standpoint, the system creates a safe, private environment for users to learn and practice dance without the anxiety or self-consciousness that might arise in group settings. Dance has been shown to reduce stress, improve mood, and boost self-esteem, benefits that become more accessible through this technology. The immediate feedback loop also provides a sense of accomplishment and progression, fostering motivation and sustained engagement in physical activity. Additionally, the system addresses safety concerns by allowing users to learn complex dance moves at their own pace in a controlled environment, with guidance that helps prevent overexertion or dangerous movements. This is especially important for beginners or those with physical limitations who need to build up their capabilities gradually.
Part B: … with consideration of social factors. Social factors relate to extended social groups having distinctive cultural, social, political, and/or economic organizations. They have importance to how people relate to each other and organize around social interests.
Our computer vision-based dance-coaching game makes dance training more accessible and engaging. Traditional dance lessons can be hard to find, especially in remote areas and especially if one does not want to consistently pay for the classes. Our game removes these barriers by letting users practice at home with just a camera and computer setup. Using Mediapipe and Unity, it analyzes an input video and compares the user’s movements to an ideal reference. Real-time feedback helps users improve without needing an in-person instructor. This makes dance education more available to people who may not have the resources or opportunities to attend formal classes.
Beyond accessibility, our game also fosters cultural exchange and social engagement. Dance is deeply tied to cultural identity, and by incorporating a variety of dance styles from different traditions, the game can serve as an educational tool that promotes appreciation for diverse artistic expressions. Users can learn and practice traditional and contemporary dance forms, helping preserve cultural heritage while making it more interactive and engaging for younger generations. Additionally, the game can create virtual dance communities, encouraging users to share their performances, participate in challenges, and interact with others who share their interests.
Part C: … with consideration of economic factors. Economic factors are those relating to the system of production, distribution, and consumption of goods and services.
Since our application relies only on a webcam and computer processing, its economic impact is primarily related to accessibility, affordability, and potential market reach. Unlike traditional dance classes, which require ongoing payments for instructors/studio rentals, our application offers a cost-effective alternative by enabling users to practice and improve their dance skills from home. This affordability makes dance education more accessible to individuals who may not have the financial means to attend in-person lessons, thus reducing economic barriers to learning a new skill.
Additionally, our application aligns with current technological trends in society, where software-based fitness and entertainment solutions generate revenue through app sales, subscriptions, or advertisements. The fact that danCe-V only requires a computer webcam also reduces the financial burden on users, as they do not need specialized equipment beyond a standard webcam and computer. This makes it an economically sustainable option for both consumers and potential business models, allowing the platform to reach a broad audience while keeping costs low.
Starting from this past week, I mostly acted in the role of a “Project Manager” for our project as a whole, managing all the tasks that needed to be done and putting a schedule together. This has been made more difficult by our second project pivot from making a dancing game to a dancing coach instead. As a part of that pivot, we decided to scrap our idea of incorporating a haptic feedback device to increase player immersion for our game. Since I had originally been designated the main person responsible for creating this haptic feedback device, I had been mostly conducting research on how I wanted to put this device together, which components to buy, and how the integration could work. Additionally, I had been preparing for our equipment procurement, which was scheduled to happen next week.
Because of our pivot, my efforts will now be redirected towards the CV part of the project. After helping the team create a renewed schedule that reflects our pivot and all the new tasks that need to be done, I’ve started to conduct research in the implementation of Computer Vision, since I’m relatively inexperienced in the field. I will then begin work on processing the input reference video and the webcam video with teammate Akul Singh.