Akul’s Status Report for 3/8

Over the past week, I focused on two main components: the design report and our single frame comparison. For the design report, I spent time developing the quantitative design requirements, the design trades, and the system implementation. With that, I conducted research into requirements for our design, finding specific reasons as to why we want to make any design decisions. For example, to mitigate latency within our system, we decided to choose only a certain subset of points in the MediaPipe output to decrease latency while maintaining accuracy in our points. I decided to go with just 17 points, as many of the points crowd at the user’s head and toes which isn’t necessary for our specific use case. Additionally, we had an idea of how we would implement our system, but I spent time creating block diagrams to put all of our thoughts together for each aspect of the system. Consequently, throughout the rest of the semester, we will have these diagrams to refer to and continue to adapt if we make any changes, so both us and viewers can better understand our system. For the design trade study, I focused on making sure that all of our decisions were fully intentional in terms of the algorithms/libraries/protocols that we were using. I explored tradeoffs between these aspects and provided concrete reasoning as to why we chose one or the other. 

This week, we also made the goal to get a working MVP of the single frame comparison, where we can take a user input and a reference video to see whether or not their dances are similar, when doing a frame-to-frame comparison. We split up the work into normalizing the points, doing the actual comparison given normalized points, and providing the Unity output. My task was to compute whether or not a frame was similar or not based on two provided jsons that represent the user input and the reference input for that frame. 

The overall algorithm that I used was pretty simple. I first created a helper function to find the euclidean distance between two points in space, which will be given in the json inputs. Then, I loop through each of the points in the jsons, computing the distance between each of them. If the distance is less than a certain threshold (.05 for now), then the similarity for that point is true. I do this for each joint that we are computing, then if 80% of the joints are “similar” enough, then the overall output for that frame is true. These metrics I decided on are very arbitrary, and I think that we will first need to integrate the code fully and test these metrics to get a better idea of what we need for our final project.

Our progress is currently on schedule. By the end of spring break, we will have a MVP of our real-time feedback system, and once that is complete we will begin to work on our multi-frame analysis and overall integrating our system together. 

Akul’s Status Report for 2/22

Over the past week, I focused on developing the dance comparison engine. For looking into the DTW algorithm, Danny and I looked into how people have already used DTW to compare human movements, including different variations of the algorithm. For example, some research papers have proposed using extensions of DTW such as generalized time warping, forward plotting DTW, or canonical time warping. Danny found a couple more variations, but one problem we found was not exactly about the algorithm itself but instead the complexity of the algorithms to be run in python code over an entire video. With that, we plan to test multiple of these variations in Python, allowing us to see which ones may be best for our comparison engine while still maintaining efficiency and minimizing latency. 

Another problem I wanted to look at with the comparison engine was how we plan to normalize the points between the reference video and the user inputs. Although we are getting the MediaPipe points in a similar manner for both the reference video and the user inputs, the coordinates of the points will be different, so our challenge lies in normalizing these for accurate comparisons. One thing that we plan to do is use the angles between the joints as part of our comparison, so I developed a python algorithm that can properly measure the angles between two joints based on the dot product and geometry of certain points. I attached some pictures of how it works using my right elbow, but this can be applied to any joint on the body that we would want to measure the angle of. 

Our progress is on schedule. In the next week, I hope to do a few things. First, continue working on the comparison engine and try to get a MVP of the engine so we can iterate and improve on it. Second, I am going to talk to people who know more about dance or who are dancers themselves and get a better understanding of what we could add to the project to make it as useful as possible for the users. 

Akul’s Status Report for 2/15

This week I was able to set up the joint tracking from a computer camera. I utilized the Python libraries OpenCV and Mediapipe in order to complete this task. With that, I had to learn key features of the libraries, going through the documentation for each and understanding the syntax. After I had a better understanding of how to use the libraries, I went and coded up an algorithm that will naively display your computer’s web camera and capture the different landmarks and points on the player’s body. Once I had this ready, I wanted to test out the feasibility of our application to be played by a regular user, so I went to multiple different locations and ran the script. With that, I found that it is definitely feasible for the player to play from the comfort of their own camera, but the user needs to stand pretty far from the computer (i.e. the user needs a decent amount of space to actually play the game). This isn’t really a problem but is just something we need to consider. 

After that, I worked with Rex to see how I should send the coordinates of the user to the Unity system to showcase the user’s joints within our game UI. I first had to figure out how to print out the coordinates of the 32 points that Mediapipe provides for us. After I did that, I organized that data into a JSON that Rex is able to actually interpret in his Unity game. We have already tested our capability to send data from a Python script to a Unity server, so now we have the capability to translate a real-life human’s moves to a character in the game.

In terms of our progress, I feel that we are on a good track. We now have a better understanding of what the capabilities are of our project and how we can go about completing it. We ran into some slight high-level setbacks when it came to switching our project from a game to an application for learning, but I believe we are moving in the right direction. 

Next week, I hope to explore further into the algorithm we will use to detect differences between a user’s dance moves and a reference video. At this point, Danny, Rex, and I have worked on different aspects individually, so next week we will focus on incorporating everything we have together and taking the next steps to actually help people learn dance moves from a reference video. 

Akul’s Status Report for 2/8

I spent the first half of the week preparing for the Project Proposal presentation. Over the weekend, I worked on fleshing out the requirements, finding quantitative measures, creating diagrams, and putting together the slides. I collaborated with my teammates over Zoom calls to refine the presentation as well. Since we all agreed that I would be doing the presentation, I spent time practicing to ensure I could explain the project clearly rather than just reading from the slides. I rehearsed both by myself and in front of others in order to gain feedback and complete the presentation as thoroughly as I could. I ended up presenting on Wednesday, so I continued to spend Monday and Tuesday practicing the presentation.

After the presentation was over, I began to shift my focus onto looking deeper into the computer vision aspect of our project. I spent time researching the Mediapipe and OpenCV libraries and reading/watching tutorials on how other people have utilized the libraries. I have done some computer vision work in the past (but nothing of this scale), so I brushed up on the OpenCV library in python. I played around with some simple test scripts that could be used for the first part of the CV algorithm that opens my computer camera and saves these images to my computer (image below). This could be the base of the project as it allows us to take camera images and use that for continued processing for the rest of the project. 

In our schedule, we allotted this week to be more of an introductory week, where we will try to get familiar with the libraries and overall better understand what our project will entail. Additionally, because we recently pivoted our project, I hope to gain an even clearer understanding of what we need to do through our weekly meeting with our faculty advisor next week. In terms of the technical requirements, next week I will try to get more familiar with the Mediapipe framework and explore specifically how we will get key points of a user’s body when they are dancing.