Team
This week we decided to have the video corrections finished by the end of the week before carnival. In order to do this, we focused on finishing up a couple things:
- Finishing and testing the frame matching algorithm to identify key frames within the user video
- Pipelining the data from when the video is taken until the corrections are made
- Creating UI pages to accommodate video capabilities
Kristina
With the data gathered from last week, I worked with Brian and Umang to identify the key points, or key frames, of each movement. For a lot of dance movements, especially in classical ballet which all of our chosen movements are taken from, there are positions that dancers move through every time which are important to perform the movement correctly. A dance teacher can easily identify and correct these positions that must always be hit when teaching a student. We take advantage of this aspect of dance in our frame matching algorithm in order to accommodate different speeds of videos and in our correction algorithm in order to give the user feedback. This is why you’ll probably hear us talk about “key frames” a lot when talking about this project. I also spent some time this week updating the UI to allow for video capture from the web camera. Unfortunately (for the time I have to work on capstone, fortunately for me personally!), Carnival weekend also means build week, so I had a lot less time this week to work on capstone since I was always out on midway building/wiring my booth. I didn’t get as much of the UI implemented as I would have hoped, so I will be focusing on that a lot more next week.
Brian
This week I finished working on the frame matching algorithm. Since last week I focused on finding the distance metrics that yielded the best and most intuitive results, and decided on a simple l2 distance metric, this week I used this metric to actually match the frames. I started by converting the video to its angle domain, and then scanning the video with the key frame, calculating the distance at each point. Then simply by taking the minimum of this distance, I found the frame the best matched the key frame.
This method, however, has the issue that it may detect a frame in any part of the video, and does not take into account when the frame is in the video. In order to correct this, I calculated the positions of the top k most similar frames, and then went through in temporal order to find the best earliest match. Given n key frames, I would run this algorithm n times, each time only giving the frames that the algorithm hadn’t seen yet as frames to match to the keyframe.
Manually testing this on the keypoints that Kristina identified, we had an extremely high success rate in detecting the proper pose within a video.
Umang
This week was a short one due to Carnival. I worked on getting a end to end video pipeline up. Given a mp4 video, I was able to ingest it into a format that can be fed into AlphaPose locally and then sent the resulting jsons to be frame matched with the ground truth (which I also helped create this week). The ground truth was the amalgam of pose estimates from different ground truth videos that Kristina captured (had to run this as a batch process before we started our pipeline so we would have access to the means and variances of the joints for a particular move). With the key frames identified (by Kristina), I was able to now provide corrections (after calling Brian’s frame matching algorithm); however, this process takes upwards of three minutes to run locally on my machine. As such, I need to explore ways to speed up the entire pipeline to optimize for our time metric.