Weekly Update #10 (4/21 – 4/27)

Team

Integration obviously means bugs, problems, and uncovering issues, as we have been warned so many times at t he beginning of the semester. This week, we continued the integration process by tackling the problems that arose. We realized that our UI needed a framework change in order to integrate with the way our back end was implemented and its in inputs and dependencies, so we focused on fixing the front and back end so that it could be properly integrated. We also continued looking into other ways to get the speed up we required, so we continued investigation into AWS and looking into possibly using a different pose estimator in order to get accurate but more efficient results. Our final presentations are next week, so we also spent time working on and polishing our slides.

Kristina

After realizing my big mistake of not looking ahead to HOW the back end would connect to the UI when designing and starting work, I had to move our front end code to a different framework. Initially I was using basic HTML/JavaScript/CSS to create web pages, but I realized that calling our python correction script and getting results back wouldn’t really work, so I decided to use Node.js in order to create a server-side application that could call the correction algorithm when an event that the user initiates happens. I honestly just chose the first framework that seemed the simplest to migrate to, and this ended up not working out as well as I had hoped. I ran into a lot of problems getting my old code to work server-side, and still need to fix a lot of issues next week. Since I’m also the one giving our presentation next week, I also spent some time preparing for it and practicing since I’m not great at presentations.

Brian

This week was a constant tug and pull between the UI side of the project and the backend. Some of the outputs that we thought were going to be necessary ended up needing to be changed to accommodate some of the recent changes in our app structure. A lot of the week was trying to get things formatted in the right way to pass between our separate parts of the project. In particular I was having issues with sending information to and from the AWS instance, but in the end was able to solve it with a lot of googling. I also worked on refactoring the code to be more editable and understandable, as well as on the final presentation.

Umang

This week I helped with the final presentation. Then, I decided to run a comparison between AlphaPose and OpenPose for running pose estimation on the AWS instance. We have a ~8 second up and down time that is irreducible but any other time is added due to the slow pose estimation. As such, I wanted to explore if OpenPose is faster for our use case on a GPU.  OpenPose runs smoothly but has much overhead if we want to retrofit it to optimize for our task. Running vanilla OpenPose led to a respectable estimation time for a 400 frame video (on one GPU), but was still over our desired metric. Though the times were comparable at first, when we added flags to our AlphaPose command to reduce the detection batch size, set the number of person to look for, and reduce the overall joint estimate confidence, we were able to get blazing fast estimation from AlphaPose (~20 seconds + ~8 seconds for the up down). This means we hit our metric of doing to end to video pose correction in under 30 seconds 🙂 Final touches to come next week!

Weekly Update #9 (4/14 – 4/20)

Team

After succeeding in finishing most of the work for the video corrections aspect of project, we decided that it was time to start integrating what we had done so far in order to get a better picture of how our project was going to look. Additionally, we realized that the amount of time that our corrections were taking was way too much for a user to justify. Therefore we wanted to find ways to speed up the pose processing. In order to do this, this we focused on:

  1. Looking into AWS as a way to boost our processing speeds
  2. Merging the existing pipelines with the UI and making it look decent

It’s also our in-lab demo next week, so we had to spend some time polishing up what our demo would look like. Since we only started integration this week, we still have problems to work through, so our in-lab demo will most likely not be fully connected or fully functional.

Kristina

This week I spent more time polishing up the UI and editing the implementation to show the recent changes made to our system. This involved being able to save a video or picture to files that could then be accessed by a script, being able to display the taken video or picture, being able to redo movement, and ensuring that the visualization on the screen acted as a mirror facing the user. The latter part was an important aspect that actually seems so small unless it doesn’t work and that we didn’t even realize until now. It’s so natural to look at a mirror; that’s how a dance class is conducted, and even more importantly, that’s how we’re used to seeing ourselves. Since our application is replacing dance class in a dance studio, it was important that the mirrored aspect of video and pictures worked. Also, because of our underestimate of the time necessary for each task, we realized that adding a text-to-speech element for the correction wasn’t the most necessary and we could replace it with a visualization, which would probably be more effective for the user since dance is a very visual art to learn.

Brian

Since I finished up on creating the frame matching algorithm, as well as helping with the pipelining last week, I decided to fine tune some of the things that we did to make it work more smoothly. Since we were only printing corrections on the terminal last week, I wanted to find a way to visualize the corrections in a way that made it apparent what the user needed to fix. In order to do this, I created functions to graph the user poses, and put them next to the instructor pose, with red circles highlighting the necessary correction. I also displayed the correction text in the same frame. I figured this would be an easy method to show all of the corrections in a way that would be easy to translate to the UI.

Umang

This week was all about speed up. Unfortunately, pose estimation using CPUs is horridly slow. We needed to explore ways to get our estimation under our desired metric of doing the pose estimation on a video in under 30 seconds. As such, I decided to explore running our pose estimation on a GPU where we would get the speedup we need to meet the metric. I worked on getting an initial pose estimation implementation of AlphaPose up on AWS. Similar to when AlphaPose runs locally, I run AlphaPose over a set of frames and give the resulting jsons to Brian to visualize as a graph. I also refactored a portion of the pipeline from last week to make it easier to ping the results from the json. The conflicting local file systems made this messy. I hope to compare pose estimation techniques (from GPUs) and continue to refactor code this next week.

Weekly Update #8 (4/7 – 4/13)

Team

This week we decided to have the video corrections finished by the end of the week before carnival. In order to do this, we focused on finishing up a couple things:

  1. Finishing and testing the frame matching algorithm to identify key frames within the user video
  2. Pipelining the data from when the video is taken until the corrections are made
  3. Creating UI pages to accommodate video capabilities

Kristina

With the data gathered from last week, I worked with Brian and Umang to identify the key points, or key frames, of each movement. For a lot of dance movements, especially in classical ballet which all of our chosen movements are taken from, there are positions that dancers move through every time which are important to perform the movement correctly. A dance teacher can easily identify and correct these positions that must always be hit when teaching a student. We take advantage of this aspect of dance in our frame matching algorithm in order to accommodate different speeds of videos and in our correction algorithm in order to give the user feedback. This is why you’ll probably hear us talk about “key frames” a lot when talking about this project. I also spent some time this week updating the UI to allow for video capture from the web camera. Unfortunately (for the time I have to work on capstone, fortunately for me personally!), Carnival weekend also means build week, so I  had a lot less time this week to work on capstone since I was always out on midway building/wiring my booth. I didn’t get as much of the UI implemented as I would have hoped, so I will be focusing on that a lot more next week.

Brian

This week I finished working on the frame matching algorithm. Since last week I focused on finding the distance metrics that yielded the best and most intuitive results, and decided on a simple l2 distance metric, this week I used this metric to actually match the frames. I started by converting the video to its angle domain, and then scanning the video with the key frame, calculating the distance at each point. Then simply by taking the minimum of this distance, I found the frame the best matched the key frame.

This method, however, has the issue that it may detect a frame in any part of the video, and does not take into account when the frame is in the video. In order to correct this, I calculated the positions of the top k most similar frames, and then went through in temporal order to find the best earliest match. Given n key frames, I would run this algorithm n times, each time only giving the frames that the algorithm hadn’t seen yet as frames to match to the keyframe.

Manually testing this on the keypoints that Kristina identified, we had an extremely high success rate in detecting the proper pose within a video.

Umang

This week was a short one due to Carnival. I worked on getting a end to end video pipeline up. Given a mp4 video, I was able to ingest it into a format that can be fed into AlphaPose locally and then sent the resulting jsons to be frame matched with the ground truth (which I also helped create this week). The ground truth was the amalgam of pose estimates from different ground truth videos that  Kristina captured (had to run this as a batch process before we started our pipeline so we would have access to the means and variances of the joints for a particular move). With the key frames identified (by Kristina), I was able to now provide corrections (after calling Brian’s frame matching algorithm); however, this process takes upwards of three minutes to run locally on my machine. As such, I need to explore ways to speed up the entire pipeline to optimize for our time metric.

Weekly Update #7 (3/31 – 4/6)

Team

After the midpoint demo this week, where we demoed running our correction algorithm on stills, we started work on corrections on videos. Before our design report and review, we had roughly designed our method to be able to match a user’s video to an instructor’s video from data collection regardless of the speed of the videos, but now we had to actually implement it. Like before, we spent some time adjusting the design of our original plan to account for problems encountered with the correction algorithm on stills before implementation. We will continue to work on frame matching and altering the correction algorithm to work with videos as well in the next week.

Kristina

I spent some time this week gathering more data since initially I just got data for the poses. I focused on taking videos of myself and a couple other dancers in my dance company doing a port de bras and a plie, which are the two moves we’ve decided to implement, but I also gathered more data for our poses as well (fifth position arms, first arabesque tendu, and a passe, since I realized I’ve never wrote the specific terms on the blog). Also the current UI is only set up for stills right now, so I spent a little bit of time redesigning it to work with videos as well. In the upcoming weeks, I hope to have a smoother version of the UI up and running.

Brian

I spent this first part of the week thinking of the best way to do corrections for videos. There were a couple of options that came to mind, but most of them were infeasible due to the amount of time that it takes to process pose estimations. Therefore, in the end we decided to correct videos by extending our image corrections to “Key Frames”. Key Frames are the poses within a video that we deem to be the defining poses necessary for the proper completion of the move. For example, in order for a push-up to be “proper”, the user must have proper form at both the to and the bottom. By isolating these frames and comparing them to the instructor’s top and bottom poses, we can correct the video.

In order to do this, we need to be able to match the instructor’s key frames to those of the user with a frame matching algorithm. I decided that it would be best to implement this matching by looking at the distance between the frame that we want to match and the corresponding user pose. This week Kristina and I experimented with a bunch of different distance metrics such a l1, l2, cosine, max, etc, and manually determined that the l2 distance yielded distances that most aligned with how similar Kristina judged 2 poses to be.

I will be using this metric to finalize the matching algorithm next week.

Umang

After a wonderful week in London, I spent this week working on video pipelining in particular I ironed out a script that allows us to locally to run estimation on images end to end and then started a pipeline to get pose estimates from a video (*.mp4 file), which would enable Brian’s frame matching algorithm to run. Working with him to devise a scheme to identify key frames made it a smooth process to run pose estimation locally: the problem was that certain frames were estimated incorrectly (due to glitches in the estimation API) and needed to be dropped. A more pressing issue is that pose estimation is really hard to run locally since it is so computationally expensive. I hope to complete the video pipeline and think about ways to speed this process up next week.