Over the past week, I focused on two main components: the design report and our single frame comparison. For the design report, I spent time developing the quantitative design requirements, the design trades, and the system implementation. With that, I conducted research into requirements for our design, finding specific reasons as to why we want to make any design decisions. For example, to mitigate latency within our system, we decided to choose only a certain subset of points in the MediaPipe output to decrease latency while maintaining accuracy in our points. I decided to go with just 17 points, as many of the points crowd at the user’s head and toes which isn’t necessary for our specific use case. Additionally, we had an idea of how we would implement our system, but I spent time creating block diagrams to put all of our thoughts together for each aspect of the system. Consequently, throughout the rest of the semester, we will have these diagrams to refer to and continue to adapt if we make any changes, so both us and viewers can better understand our system. For the design trade study, I focused on making sure that all of our decisions were fully intentional in terms of the algorithms/libraries/protocols that we were using. I explored tradeoffs between these aspects and provided concrete reasoning as to why we chose one or the other.
This week, we also made the goal to get a working MVP of the single frame comparison, where we can take a user input and a reference video to see whether or not their dances are similar, when doing a frame-to-frame comparison. We split up the work into normalizing the points, doing the actual comparison given normalized points, and providing the Unity output. My task was to compute whether or not a frame was similar or not based on two provided jsons that represent the user input and the reference input for that frame.
The overall algorithm that I used was pretty simple. I first created a helper function to find the euclidean distance between two points in space, which will be given in the json inputs. Then, I loop through each of the points in the jsons, computing the distance between each of them. If the distance is less than a certain threshold (.05 for now), then the similarity for that point is true. I do this for each joint that we are computing, then if 80% of the joints are “similar” enough, then the overall output for that frame is true. These metrics I decided on are very arbitrary, and I think that we will first need to integrate the code fully and test these metrics to get a better idea of what we need for our final project.
Our progress is currently on schedule. By the end of spring break, we will have a MVP of our real-time feedback system, and once that is complete we will begin to work on our multi-frame analysis and overall integrating our system together.