Akul’s Status Report for 2/22

Over the past week, I focused on developing the dance comparison engine. For looking into the DTW algorithm, Danny and I looked into how people have already used DTW to compare human movements, including different variations of the algorithm. For example, some research papers have proposed using extensions of DTW such as generalized time warping, forward plotting DTW, or canonical time warping. Danny found a couple more variations, but one problem we found was not exactly about the algorithm itself but instead the complexity of the algorithms to be run in python code over an entire video. With that, we plan to test multiple of these variations in Python, allowing us to see which ones may be best for our comparison engine while still maintaining efficiency and minimizing latency. 

Another problem I wanted to look at with the comparison engine was how we plan to normalize the points between the reference video and the user inputs. Although we are getting the MediaPipe points in a similar manner for both the reference video and the user inputs, the coordinates of the points will be different, so our challenge lies in normalizing these for accurate comparisons. One thing that we plan to do is use the angles between the joints as part of our comparison, so I developed a python algorithm that can properly measure the angles between two joints based on the dot product and geometry of certain points. I attached some pictures of how it works using my right elbow, but this can be applied to any joint on the body that we would want to measure the angle of. 

Our progress is on schedule. In the next week, I hope to do a few things. First, continue working on the comparison engine and try to get a MVP of the engine so we can iterate and improve on it. Second, I am going to talk to people who know more about dance or who are dancers themselves and get a better understanding of what we could add to the project to make it as useful as possible for the users. 

Danny’s Status Report for 2/22

This past week I was responsible for presenting our project during the Design Review. As a result, I spent most of the time during the first half of the week refining the presentation as well as practicing my delivery. After that, since we are ahead of schedule in terms of the CV system implementation, I focused on doing research into the specific algorithms and optimization methods we can use to construct our 3D comparison engine. Since we want to provide feedback in a timely manner, whether that’s after the entire dance or real-time, computation speed is a big problem for us since our chosen algorithm (DTW) is extremely computationally intensive. Therefore, I spent time looking specifically into optimization methods that include papers written on PrunedDTW, FastDTW, SparseDTW, etc.

 

Illustration of DTW:

Implementation of standard DTW:

Example of SparseDTW:

Al-Naymat, G., Chawla, S., Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping.

Olsen, NL; Markussen, B; Raket, LL (2018), “Simultaneous inference for misaligned multivariate functional data”, Journal of the Royal Statistical Society, Series C, 67 (5): 1147–76, arXiv:1606.03295, doi:10.1111/rssc.12276, S2CID 88515233

Team Status Report for 2/22

Team Status Report

Risk Management:

Risk: While our proposed solution may achieve accurate comparison on a technical level, our feedback system design carries the risk of not being exactly what our targeted users what/need.

Mitigation Strategy/Contingency plan: We plan to reach out to a variety of potential users of this system, including serious dancers, tiktok influencers who record dances regularly, and regular people who may record a casual dance or two once in a while. We will then use the feedback gathered from these potential users to better inform the specific design of how we generate our feedback.

Design Changes:

There were no design changes this week. We have continued to execute our schedule.

Rex’s Status Report for 2/22

This week, I focused on improving the natural movement of the character in our CV-based dance coach. By manually adjusting the targets that each joint should track, including arms, legs, and neck, I was able to refine the character’s motion to make it feel more fluid and realistic. The attached video demonstrates these improvements, showing how the avatar’s motion now closely mirrors user input in a more natural way given 3 different dance test poses to move to/from. Additionally, I worked on optimizing our UDP network communication, ensuring that packet transmission and reception are stable with good throughput. One major challenge I encountered is normalizing the coordinate data from MediaPipe’s OpenCV-based pose tracking to Unity’s avatar system, as they operate on different scales. To better understand how real-time feedback should be structured, I also consulted 3 friends who enjoy TikTok dancing and learning these trendy dances on the fly. Based on their input, I confirmed that a Just Dance-style single-frame feedback system is intuitive, fun, and easy to engage with, making it an ideal approach for our project, so I’ve decided with our teammates that this should be what we look for in our design requirements as well.

Currently, our progress is on track with our proposed schedule, as we are actively working on synchronizing user input with the reference video using CV techniques. We have also taken feedback from our Design Proposal Presentation this week to incorporate a more refined design requirement for the feedback part of our system. Moving forward, my next major focus will be on normalizing the MediaPipe pose estimation coordinates to correctly align with Unity’s character rig. This is a crucial step in ensuring accurate comparisons between the user’s movements and the reference video. Additionally, this will augment our upcoming tasks of making a real-time feedback system. If any delays arise on my part, I will dedicate additional time to debugging and adjusting the coordinate mappings to keep us aligned with our project timeline.

Danny’s Status Report for 2/15

This past week, as outlined on the schedule, I primarily focused on processing reference video inputs with OpenCV. I spent time exploring both MediaPipe and Open Pose as different ways to process and label the reference input video. After spending a substantial amount of time experimenting with both, we as a team decided that MediaPipe was a better fit for our needs. I then proceeded to test the MediaPipe pipeline with video inputs, initially with just a simple recording of myself. This initial test yielded unsatisfactory results, prompting me to continue to fine tune the MediaPipe library and OpenCV capturing.

The MediaPipe library comes with several base models. It also has a variety of options that includes:

  • min_pose_detection_confidence (0.0-1.0):
    • Controls how confident the model needs to be to report a pose detection
    • Higher values reduce false positives but might miss some poses
    • Lower values catch more poses but may include false detections
  • min_pose_presence_confidence (0.0-1.0):
    • Threshold for considering a pose to be present
    • Affects how readily the model reports pose presence
  • min_tracking_confidence (0.0-1.0):
    • For video mode, controls how confident the tracker needs to be to maintain tracking
    • Lower values make tracking more stable but might track incorrect poses
    • Higher values are more precise but might lose tracking more easily
  • num_poses:
    • Maximum number of poses to detect in each frame
    • Increasing this will detect more poses but use more processing power
    • Default is 1
  • Output_segmentation_masks:
    • Boolean to enable/disable segmentation mask output
    • Disabling can improve performance if you don’t need masks

After experimentation, I found that the parameters that affected our detection the most was the min_pose_detection_confidence as well as the min_pose_presence_confidence parameters. After fine tuning these parameters, I was able to achieve much better tracking on not just my own simple testing video, but also a relatively complex YouTube dancing short. As we continue to work on this algorithm and integrating the systems together, I will also continue to experiment with the options to try to optimize the performance while keeping tracking confidence as high as possible.

 

Testing with recorded footage from webcam:

Testing with YouTube shorts dancing video ():

Akul’s Status Report for 2/15

This week I was able to set up the joint tracking from a computer camera. I utilized the Python libraries OpenCV and Mediapipe in order to complete this task. With that, I had to learn key features of the libraries, going through the documentation for each and understanding the syntax. After I had a better understanding of how to use the libraries, I went and coded up an algorithm that will naively display your computer’s web camera and capture the different landmarks and points on the player’s body. Once I had this ready, I wanted to test out the feasibility of our application to be played by a regular user, so I went to multiple different locations and ran the script. With that, I found that it is definitely feasible for the player to play from the comfort of their own camera, but the user needs to stand pretty far from the computer (i.e. the user needs a decent amount of space to actually play the game). This isn’t really a problem but is just something we need to consider. 

After that, I worked with Rex to see how I should send the coordinates of the user to the Unity system to showcase the user’s joints within our game UI. I first had to figure out how to print out the coordinates of the 32 points that Mediapipe provides for us. After I did that, I organized that data into a JSON that Rex is able to actually interpret in his Unity game. We have already tested our capability to send data from a Python script to a Unity server, so now we have the capability to translate a real-life human’s moves to a character in the game.

In terms of our progress, I feel that we are on a good track. We now have a better understanding of what the capabilities are of our project and how we can go about completing it. We ran into some slight high-level setbacks when it came to switching our project from a game to an application for learning, but I believe we are moving in the right direction. 

Next week, I hope to explore further into the algorithm we will use to detect differences between a user’s dance moves and a reference video. At this point, Danny, Rex, and I have worked on different aspects individually, so next week we will focus on incorporating everything we have together and taking the next steps to actually help people learn dance moves from a reference video. 

Team Status Report for 2/15

Team Status Report

Risk Management:

Risk: Losing movement details in the transition from MediaPipe to the Unity inputs. This is something we are noticing after running some initial experiments this week in trying to push simple movements through MediaPipe into Unity.

Mitigation Strategy/Contingency plan: Unity has different kinds of joint options (either choice of Two-Bone Inverse Kinematic Constraint/Multi-Aim Constraint/Damped Transform/Rotation Constraint), so testing between these 4 types of joint options and finding what looks the most natural, and is most coherent with our MediaPipe data.

Design Changes:

  1. Specific Design Updates:
  • Change: Selecting MediaPipe as our library of choice as opposed to Open Pose
    • Why: More detailed documentation, ease of use, better match with the amount of details we require
  • Change: 3D Comparative Analysis Engine to be done in Unity
    • Why: Unity’s detailed avatar rigging allows us to display the dance moves with accuracy and compare the webcam footage with the reference video with sufficient detail

 

  1. Cost Impact and Mitigation:

– No direct costs incurred these changes were a part of the planned exploratory stage in our schedule

Updated Schedule:

Part A was written by Danny   Cui, Part B was written by Rex Kim, Part C was written by Akul Singh

It is possible, though probably unusual that the answer to a particular question would be “does not apply.” In such a case, please describe what you have considered to ensure that it does not apply.

Please write a paragraph or two describing how the product solution you are designing will meet a specified need…

 

Part A: … with respect to considerations of public health, safety or welfare. Note: The term ‘health’ refers to a state of well-being of people in both a physiological and psychological sense. ‘Safety’ is the absence of hazards and/or physical harm to persons. The term ‘welfare’ relates to the provision of the basic needs of people.

  • From a physical health perspective, the system promotes regular exercise through dance, which improves cardiovascular fitness, flexibility, coordination, and muscle strength. The feedback mechanism ensures users maintain proper form and technique, reducing the risk of dance related injuries that could occur from incorrect movements or posture. This is particularly valuable for individuals who may not have access to in person dance instruction or cannot afford regular dance classes.
  • From a psychological health and welfare standpoint, the system creates a safe, private environment for users to learn and practice dance without the anxiety or self-consciousness that might arise in group settings. Dance has been shown to reduce stress, improve mood, and boost self-esteem, benefits that become more accessible through this technology. The immediate feedback loop also provides a sense of accomplishment and progression, fostering motivation and sustained engagement in physical activity. Additionally, the system addresses safety concerns by allowing users to learn complex dance moves at their own pace in a controlled environment, with guidance that helps prevent overexertion or dangerous movements. This is especially important for beginners or those with physical limitations who need to build up their capabilities gradually.

 

Part B: … with consideration of social factors. Social factors relate to extended social groups having distinctive cultural, social, political, and/or economic organizations. They have importance to how people relate to each other and organize around social interests.

 

  • Our computer vision-based dance-coaching game makes dance training more accessible and engaging. Traditional dance lessons can be hard to find, especially in remote areas and especially if one does not want to consistently pay for the classes. Our game removes these barriers by letting users practice at home with just a camera and computer setup. Using Mediapipe and Unity, it analyzes an input video and compares the user’s movements to an ideal reference. Real-time feedback helps users improve without needing an in-person instructor. This makes dance education more available to people who may not have the resources or opportunities to attend formal classes.
  • Beyond accessibility, our game also fosters cultural exchange and social engagement. Dance is deeply tied to cultural identity, and by incorporating a variety of dance styles from different traditions, the game can serve as an educational tool that promotes appreciation for diverse artistic expressions. Users can learn and practice traditional and contemporary dance forms, helping preserve cultural heritage while making it more interactive and engaging for younger generations. Additionally, the game can create virtual dance communities, encouraging users to share their performances, participate in challenges, and interact with others who share their interests.

 

Part C: … with consideration of economic factors. Economic factors are those relating to the system of production, distribution, and consumption of goods and services.

  • Since our application relies only on a webcam and computer processing, its economic impact is primarily related to accessibility, affordability, and potential market reach. Unlike traditional dance classes, which require ongoing payments for instructors/studio rentals, our application offers a cost-effective alternative by enabling users to practice and improve their dance skills from home. This affordability makes dance education more accessible to individuals who may not have the financial means to attend in-person lessons, thus reducing economic barriers to learning a new skill.
  • Additionally, our application aligns with current technological trends in society, where software-based fitness and entertainment solutions generate revenue through app sales, subscriptions, or advertisements. The fact that danCe-V only requires a computer webcam also reduces the financial burden on users, as they do not need specialized equipment beyond a standard webcam and computer. This makes it an economically sustainable option for both consumers and potential business models, allowing the platform to reach a broad audience while keeping costs low.

 

Images:

Testing from video input: 

Testing from direct webcam footage:

Rex’s Status Report for 2/15

This week, I focused on synchronizing the inputs from Python into Unity, specifically working on the networking side of our project. I explored different methods to send data efficiently, considering various types of networks and sockets. Initially, I looked into using TCP for reliability, but I quickly realized that for a real-time application like ours that specializes in timing, UDP would be a better fit. Given that our game runs on a local device, the risk of packet loss is minimal, and UDP allows for lower latency, which is crucial for smooth gameplay. I tested different approaches for handling packets, ensuring that the data sent from the CV model reached Unity without excessive delay. The tests confirmed that UDP provided stable packet delivery in our setup, making it the optimal choice. Alongside networking, I spent time analyzing the outputs from our computer vision model, experimenting with both Mediapipe and OpenPose. OpenPose provided detailed joint tracking, but after testing both models, we found that Mediapipe’s output aligned better with our Unity implementation. It provided smoother and more reliable keypoint data, making it easier to work with for our in-game character movement.

After analyzing the CV model’s output, I started integrating the movement data into the character rig I set up last week. While mapping the positional data to the character, I noticed that the movement looked unnatural and stiff. To fix this, I began experimenting with different animation constraints in Unity, such as Two-Bone Inverse Kinematics, Multi-Aim Constraints, Damped Transform, and Rotation Constraints. I tested how each constraint affected the character’s fluidity and responsiveness when applied to different joints. My goal is to make the character’s motion feel more natural while staying true to the user’s real movements. Currently, I am iterating on these adjustments and expect to refine them further next week. In terms of progress, I believe I am on schedule, but refining the animation may take some extra time. Next week, I plan to improve the character’s movement smoothness by fine-tuning constraint settings and experimenting with filtering techniques to reduce jitter in the CV data. I will also help refine the CV pipeline to ensure correct delivery of outputs to Unity, focusing on optimizing how frequently we sample and send the movement data to balance accuracy and performance.

Akul’s Status Report for 2/8

I spent the first half of the week preparing for the Project Proposal presentation. Over the weekend, I worked on fleshing out the requirements, finding quantitative measures, creating diagrams, and putting together the slides. I collaborated with my teammates over Zoom calls to refine the presentation as well. Since we all agreed that I would be doing the presentation, I spent time practicing to ensure I could explain the project clearly rather than just reading from the slides. I rehearsed both by myself and in front of others in order to gain feedback and complete the presentation as thoroughly as I could. I ended up presenting on Wednesday, so I continued to spend Monday and Tuesday practicing the presentation.

After the presentation was over, I began to shift my focus onto looking deeper into the computer vision aspect of our project. I spent time researching the Mediapipe and OpenCV libraries and reading/watching tutorials on how other people have utilized the libraries. I have done some computer vision work in the past (but nothing of this scale), so I brushed up on the OpenCV library in python. I played around with some simple test scripts that could be used for the first part of the CV algorithm that opens my computer camera and saves these images to my computer (image below). This could be the base of the project as it allows us to take camera images and use that for continued processing for the rest of the project. 

In our schedule, we allotted this week to be more of an introductory week, where we will try to get familiar with the libraries and overall better understand what our project will entail. Additionally, because we recently pivoted our project, I hope to gain an even clearer understanding of what we need to do through our weekly meeting with our faculty advisor next week. In terms of the technical requirements, next week I will try to get more familiar with the Mediapipe framework and explore specifically how we will get key points of a user’s body when they are dancing. 

 

 

Danny’s Status Report for 2/8

Starting from this past week, I mostly acted in the role of a “Project Manager” for our project as a whole, managing all the tasks that needed to be done and putting a schedule together. This has been made more difficult by our second project pivot from making a dancing game to a dancing coach instead. As a part of that pivot, we decided to scrap our idea of incorporating a haptic feedback device to increase player immersion for our game. Since I had originally been designated the main person responsible for creating this haptic feedback device, I had been mostly conducting research on how I wanted to put this device together, which components to buy, and how the integration could work. Additionally, I had been preparing for our equipment procurement, which was scheduled to happen next week.

Because of our pivot, my efforts will now be redirected towards the CV part of the project. After helping the team create a renewed schedule that reflects our pivot and all the new tasks that need to be done, I’ve started to conduct research in the implementation of Computer Vision, since I’m relatively inexperienced in the field. I will then begin work on processing the input reference video and the webcam video with teammate Akul Singh.