Akul’s Status Report for 4/12

This week I worked on developing multiple unique comparison algorithms, with the goal of iterating upon our existing comparison algorithm, trying new features, and being intentional about the design decisions behind the algorithm. For the interim demo, we already had an algorithm that utilizes dynamic time warping, analyzes 30 frames every 0.1 seconds, and computes similarity based on each joint individually. This week I focused on making 5 unique comparison algorithms, allowing me to compare the accuracy of how different parameters and features of the algorithm can improve the effectiveness of our dance coach. The following are the 5 variations I created, and I will compare these 5 with our original to help find an optimal solution:

  1. Frame-to-frame comparisons: does not use dynamic-time warping, simply creates a normalization matrix between the reference video and user webcam input and compares the coordinates on each frame.
  2. Dynamic-time-warping, weighted similarity calculations: builds upon our algorithm for the interim demo to calculate the similarity between different joints to relatively weigh more than other joints.
  3. Dynamic-time-warping, increasing analysis window/frame buffer: builds upon our algorithm for the interim demo to increase the analysis window and frame buffer to get a  more accurate DTW analysis.
  4. Velocity-based comparisons: similar to the frame-to-frame comparisons, but computes the velocity of joints over time as they move, and compares those velocities to the reference video velocities in order to detect not exactly where the joints are, but how the joints move over time.
  5. Velocity-based comparisons with frame-to-frame comparisons: iterates upon the velocity comparisons to utilize both the velocity comparisons and the frame-to-frame joint position comparisons to see if that would provide an accurate measurement of comparison between the reference and user input video.

I have implemented and debugged these algorithms above, but starting tomorrow and continuing throughout the week, I will conduct quantitative and qualitative comparisons between these algorithms to see which is best for our use case and find further points to improve. Additionally, I will communicate with Rex and Danny to see how I can make it as easy as possible to integrate the comparison algorithm with the Unity side portion of the game. Overall, our progress seems to be on schedule; if I can get the comparison algorithm finalized within the next week and we begin integration in the meantime, we will be on a good track to be finished by the final demo and deadline.

There are two main parts that I will need to test and verify for the comparison algorithm. First, I aim to test the real-time processing performance of each of these algorithms. For example, the DTW algorithm with the extended search video may require too much computation power to allow for real-time comparisons. On the other hand, the velocity/frame-to-frame comparison algorithms may have space to add more complexity in order to improve the accuracy of the comparison without resulting in problems with the processing performance. 

Second, I am to test the accuracy of each of these comparison algorithms. For each of the algorithms described above, I will run the algorithm on a complex video (such as a TikTok dance), a simpler video (such as a video of myself dancing to a slower dance), and a still video (such as me doing a T-pose in the camera). With this, I will record the output after I actively do the dance, allowing me to watch the video back and see how each algorithm does. After, I will create a table that allows me to note both quantitative and qualitative notes I have on each algorithm, seeing what parts of the algorithm are lacking and performing well. This will allow me to have all the data I need in front of me when deciding what I should do to continue iterating upon the algorithm.

With these two strategies, I believe that we will be on a good track to verify the effectiveness of our dance coach and create the best possible comparison algorithm we can to help our users.

Akul’s Status Report for 3/29

This week, I worked on getting us set up for the Interim Demo. After meeting with the professor on Monday, we explored how to improve our comparison algorithm. Before, we mostly just had a frame-to-frame comparison which had okay accuracy. With that, we explored how to use DTW, not just for post-processing, but also for real-time feedback. I first started by doing some more research into how other people have used DTW for video processing. I read a few papers on how others used DTW for feedback, and I was able to gain a better understanding of how the algorithm works and why it is suitable for our application.

We incorporated DTW by comparing shorter segments of the input video. The biggest pivot we had with this compared to what we originally planned to do was using DTW for the real-time feedback. We did this by comparing specific segments of the video at a time with DTW, rather than using the entire video. We did this because of the time-complexity of DTW – the longer the segment we choose (our original plan was to make the segment the whole video), the longer it will take, as it has a quadratic time complexity. In this case, we were able to segment the video into smaller chunks, allowing us to use DTW for real-time feedback. 

Additionally, I worked on getting test data and planning how our actual interim demo will go. I considered the use-case application of our system, looking at actual dances we would want to replicate. One thing that I found that I personally enjoyed was learning how to do Fortnite dances, which are short and simple dances that can be generally difficult to master. We also played around with uploading these videos to our pipelined system, allowing us to test with other inputs. 

Our progress is on schedule. We have two main components: the Unity side that displays both the reference video and the user video human figures to showcase in real-time how the user dances, and the comparison algorithm that actually showcases what parts of your dance moves correspond to a video, providing if you are dancing well or not. Next steps include integrating both of these aspects together for our total final demo in the next few weeks.

Akul’s Status Report for 3/22

This week I focused on improving our comparison algorithm logic and exploring the dynamic-time warping post processing algorithm. In regards to the frame-by-frame comparison algorithm, last week, I made an algorithm that takes in two videos and outputs if the dance moves were similar or not. However, the actual comparison was giving too many false positives. I worked on debugging this with Danny to see what some of the problems were with this, and I found that some of the thresholds were too high in the comparison logic. After tweaking these and spending time testing these with other video points, the comparisons got better, but they aren’t 100% accurate. 

With that, I decided to begin working on the dynamic-time warping algorithm to get a sense of what we could do to improve our overall performance and feedback to the user. I spent some time thinking about how we would implement the dynamic-time warping algorithm and also how we would use this to actually provide useful feedback for the user. I broke it down to measure similarity but also highlight specific areas for improvement, such as timing, posture, or limb positioning using specific points in the mediapipe dataset. I began implementation, but am currently running into some bugs that I will fix next week. 

I also worked with Rex to begin incorporating the comparison logic to the Unity game. We met to catch each other up on our progress and to plan how we will integrate our parts. There were some things that we needed to modify such as the JSON formatting to make sure everything would be okay compatibility wise. For next week, one goal we definitely have is to incorporate our codebases more fully so we can have a successful interim demo the week after.

Akul’s Status Report for 3/15

This week I focused on developing the comparison algorithm. Now that we had the code to normalize the points based on different camera angles, we had the capability to create a more fleshed out comparison engine to see if two videos contain the same dance moves. 

I spent my time this week creating a script that will take in two videos (one a reference video, one a user video) and see if the videos match via frame-to-frame comparisons. In our actual final project, the second video will be replaced with real-time video processing, but just for testing’s sake I made it so I could upload two videos. I used two videos of my partner Danny who does the same dance moves at different angles from the camera and at some different times. Using these videos, I had to extract the landmarks, get the pose data, and normalize the data in case there were any differences in camera poses. After that, I parsed through the JSONs, trying to see if each of the JSONs at each comparable frame are similar enough. I then created a side-by-side comparison UI that allows us to tell which frames are similar, and which frames are different. The comparison is pretty good for the most part, but I did find that there were some false positives, so I modified the thresholds and it got better as well.

Overall, our progress seems to be on schedule. The next steps will be integrating this logic into the Unity side instead of just the server side code. Additionally, I will need to change the logic to take inputs from a webcam and a reference video instead of uploading two videos, but this should be trivial. Overall, the biggest thing will be to test our system more thoroughly with more data points and videos. Next week, we will work on testing the system more thoroughly as well as beginning to work on our DTW post-video analysis engine.

I couldn’t upload a less blurry picture due to maximum file upload size constraints so apologies for any blurriness in the following images.

Match

No Match

Akul’s Status Report for 3/8

Over the past week, I focused on two main components: the design report and our single frame comparison. For the design report, I spent time developing the quantitative design requirements, the design trades, and the system implementation. With that, I conducted research into requirements for our design, finding specific reasons as to why we want to make any design decisions. For example, to mitigate latency within our system, we decided to choose only a certain subset of points in the MediaPipe output to decrease latency while maintaining accuracy in our points. I decided to go with just 17 points, as many of the points crowd at the user’s head and toes which isn’t necessary for our specific use case. Additionally, we had an idea of how we would implement our system, but I spent time creating block diagrams to put all of our thoughts together for each aspect of the system. Consequently, throughout the rest of the semester, we will have these diagrams to refer to and continue to adapt if we make any changes, so both us and viewers can better understand our system. For the design trade study, I focused on making sure that all of our decisions were fully intentional in terms of the algorithms/libraries/protocols that we were using. I explored tradeoffs between these aspects and provided concrete reasoning as to why we chose one or the other. 

This week, we also made the goal to get a working MVP of the single frame comparison, where we can take a user input and a reference video to see whether or not their dances are similar, when doing a frame-to-frame comparison. We split up the work into normalizing the points, doing the actual comparison given normalized points, and providing the Unity output. My task was to compute whether or not a frame was similar or not based on two provided jsons that represent the user input and the reference input for that frame. 

The overall algorithm that I used was pretty simple. I first created a helper function to find the euclidean distance between two points in space, which will be given in the json inputs. Then, I loop through each of the points in the jsons, computing the distance between each of them. If the distance is less than a certain threshold (.05 for now), then the similarity for that point is true. I do this for each joint that we are computing, then if 80% of the joints are “similar” enough, then the overall output for that frame is true. These metrics I decided on are very arbitrary, and I think that we will first need to integrate the code fully and test these metrics to get a better idea of what we need for our final project.

Our progress is currently on schedule. By the end of spring break, we will have a MVP of our real-time feedback system, and once that is complete we will begin to work on our multi-frame analysis and overall integrating our system together. 

Akul’s Status Report for 2/22

Over the past week, I focused on developing the dance comparison engine. For looking into the DTW algorithm, Danny and I looked into how people have already used DTW to compare human movements, including different variations of the algorithm. For example, some research papers have proposed using extensions of DTW such as generalized time warping, forward plotting DTW, or canonical time warping. Danny found a couple more variations, but one problem we found was not exactly about the algorithm itself but instead the complexity of the algorithms to be run in python code over an entire video. With that, we plan to test multiple of these variations in Python, allowing us to see which ones may be best for our comparison engine while still maintaining efficiency and minimizing latency. 

Another problem I wanted to look at with the comparison engine was how we plan to normalize the points between the reference video and the user inputs. Although we are getting the MediaPipe points in a similar manner for both the reference video and the user inputs, the coordinates of the points will be different, so our challenge lies in normalizing these for accurate comparisons. One thing that we plan to do is use the angles between the joints as part of our comparison, so I developed a python algorithm that can properly measure the angles between two joints based on the dot product and geometry of certain points. I attached some pictures of how it works using my right elbow, but this can be applied to any joint on the body that we would want to measure the angle of. 

Our progress is on schedule. In the next week, I hope to do a few things. First, continue working on the comparison engine and try to get a MVP of the engine so we can iterate and improve on it. Second, I am going to talk to people who know more about dance or who are dancers themselves and get a better understanding of what we could add to the project to make it as useful as possible for the users. 

Akul’s Status Report for 2/15

This week I was able to set up the joint tracking from a computer camera. I utilized the Python libraries OpenCV and Mediapipe in order to complete this task. With that, I had to learn key features of the libraries, going through the documentation for each and understanding the syntax. After I had a better understanding of how to use the libraries, I went and coded up an algorithm that will naively display your computer’s web camera and capture the different landmarks and points on the player’s body. Once I had this ready, I wanted to test out the feasibility of our application to be played by a regular user, so I went to multiple different locations and ran the script. With that, I found that it is definitely feasible for the player to play from the comfort of their own camera, but the user needs to stand pretty far from the computer (i.e. the user needs a decent amount of space to actually play the game). This isn’t really a problem but is just something we need to consider. 

After that, I worked with Rex to see how I should send the coordinates of the user to the Unity system to showcase the user’s joints within our game UI. I first had to figure out how to print out the coordinates of the 32 points that Mediapipe provides for us. After I did that, I organized that data into a JSON that Rex is able to actually interpret in his Unity game. We have already tested our capability to send data from a Python script to a Unity server, so now we have the capability to translate a real-life human’s moves to a character in the game.

In terms of our progress, I feel that we are on a good track. We now have a better understanding of what the capabilities are of our project and how we can go about completing it. We ran into some slight high-level setbacks when it came to switching our project from a game to an application for learning, but I believe we are moving in the right direction. 

Next week, I hope to explore further into the algorithm we will use to detect differences between a user’s dance moves and a reference video. At this point, Danny, Rex, and I have worked on different aspects individually, so next week we will focus on incorporating everything we have together and taking the next steps to actually help people learn dance moves from a reference video. 

Akul’s Status Report for 2/8

I spent the first half of the week preparing for the Project Proposal presentation. Over the weekend, I worked on fleshing out the requirements, finding quantitative measures, creating diagrams, and putting together the slides. I collaborated with my teammates over Zoom calls to refine the presentation as well. Since we all agreed that I would be doing the presentation, I spent time practicing to ensure I could explain the project clearly rather than just reading from the slides. I rehearsed both by myself and in front of others in order to gain feedback and complete the presentation as thoroughly as I could. I ended up presenting on Wednesday, so I continued to spend Monday and Tuesday practicing the presentation.

After the presentation was over, I began to shift my focus onto looking deeper into the computer vision aspect of our project. I spent time researching the Mediapipe and OpenCV libraries and reading/watching tutorials on how other people have utilized the libraries. I have done some computer vision work in the past (but nothing of this scale), so I brushed up on the OpenCV library in python. I played around with some simple test scripts that could be used for the first part of the CV algorithm that opens my computer camera and saves these images to my computer (image below). This could be the base of the project as it allows us to take camera images and use that for continued processing for the rest of the project. 

In our schedule, we allotted this week to be more of an introductory week, where we will try to get familiar with the libraries and overall better understand what our project will entail. Additionally, because we recently pivoted our project, I hope to gain an even clearer understanding of what we need to do through our weekly meeting with our faculty advisor next week. In terms of the technical requirements, next week I will try to get more familiar with the Mediapipe framework and explore specifically how we will get key points of a user’s body when they are dancing.