Weekly Update #11 (4/28 – 5/4)

Team

This week was spent on the final polishing and integration to prepare for the final demo. We ran into a pretty big issue with our UI/script integration, so we decided to move to the Flask framework to fix our issues. We also added quality of life features like displaying saved videos and images for the user to see before corrections, and options like redoing a move if they were unsatisfied with how they performed. We also tested timing differences between running on different AWS instances, as well as different flags for the various functions to give the fastest corrections without sacrificing speed.

Kristina

After realizing that our Node.js framework was getting over-complicated to switch over to and that the way I was calling a test python script wasn’t going to work with this implementation since the local file system couldn’t be accessed, we decided to shift our framework again. Luckily Brian did more research and had time, so he took over moving the framework for the second time so that I could focus on making the UI design more presentable and polished. Before, I was focusing on making all the elements there (viewing a demonstration of the pose or move, check. web cam access, check. viewing web cam feed mirrored, check. etc) but now I had to focus on not making it an eye sore. I worked on making sure all pages could be navigated to simply and intuitively and focused on styling and making elements look nice. I also helped with testing the final product and specifically with editing the styling to make sure that everything displayed nicely on a different laptop size and still worked with the user flow, where the user has to step away from the laptop screen in order to perform their movement. It’s wild that the semester is over already and that demos are so soon!

Brian

We realized this week that we had created the UI for the different poses and were able to run the scripts separately and display them in the UI, but were not able to run them together. This is due to the fact that our UI could not access files in our local file system. Since we needed to download the user images and videos, and send them over to be processed on AWS this was an issue. After doing some quick searching, I decided that a Flask framework would solve our issues easily. Therefore I ported over our existing UI, and defined all the functions necessary to get our website to access and interact with local files.

I ensured that each page had separate accesses, and that all user files were disposed of after being used in order to prepare for the next batch. In order to make the website work with the way Flask calls functions, I had to make slight changes to the structure of the website, but was able to integrate it in a way that wasn’t noticeable to the end user.

Finally I did a lot of the testing of the final integrated product, and caught a few  small errors that would have messed up the execution during the demo.

Umang

This week was the final countdown to demo day. Our goal was to get an end to end pipeline up and running, while fully integrating it with the UI. While others worked on the front end, I wanted to optimize the backend for an even smoother experience. Rather than taking 29 seconds for a video and 15 seconds for an image, I wanted to break sub-20 for a video and sub-10 for an image. The best place to shave off time was in the pose estimation by increasing the speed of AlphaPose and decreasing the frame rate of the original video.

It turned out that the UI saved the video as a *.webm file and AlphaPose did not take in this type. As such, I had to automatically a conversion function (ffmpeg was the one I picked) to convert it from *.webm to *.mp4.  Unfortunately, this conversion actually expanded instead of compressed the video which led to even slower pose estimation by AlphaPose. By setting a reduced frame rate flag, I was able to subsample in the video and then run the shorter video through the pose estimation network (with the relaxed confidence, lower number of people, and a increased detection batch). With these changes, I got the video estimation down to 1 second for a 4 second long *.webm file (with added time for the ffmpeg call with the subsampling).

This updated video pipeline ran in ~11 seconds total (including up-down to the instance and the longer ffmpeg) and ran in ~8 seconds for an image. Unfortunately, the AWS instance we used for this was a P3 instance (which had an unsustainable cost of $12/hr). So we settled for the normal P2 instance (which had a cheap cost of $0.90/hr). This pipeline on the P2 ran a video through in ~15 seconds and an image through in ~9 seconds. Both of these times far surpassed our original metrics. We look forward to the demo 🙂

Weekly Update #10 (4/21 – 4/27)

Team

Integration obviously means bugs, problems, and uncovering issues, as we have been warned so many times at t he beginning of the semester. This week, we continued the integration process by tackling the problems that arose. We realized that our UI needed a framework change in order to integrate with the way our back end was implemented and its in inputs and dependencies, so we focused on fixing the front and back end so that it could be properly integrated. We also continued looking into other ways to get the speed up we required, so we continued investigation into AWS and looking into possibly using a different pose estimator in order to get accurate but more efficient results. Our final presentations are next week, so we also spent time working on and polishing our slides.

Kristina

After realizing my big mistake of not looking ahead to HOW the back end would connect to the UI when designing and starting work, I had to move our front end code to a different framework. Initially I was using basic HTML/JavaScript/CSS to create web pages, but I realized that calling our python correction script and getting results back wouldn’t really work, so I decided to use Node.js in order to create a server-side application that could call the correction algorithm when an event that the user initiates happens. I honestly just chose the first framework that seemed the simplest to migrate to, and this ended up not working out as well as I had hoped. I ran into a lot of problems getting my old code to work server-side, and still need to fix a lot of issues next week. Since I’m also the one giving our presentation next week, I also spent some time preparing for it and practicing since I’m not great at presentations.

Brian

This week was a constant tug and pull between the UI side of the project and the backend. Some of the outputs that we thought were going to be necessary ended up needing to be changed to accommodate some of the recent changes in our app structure. A lot of the week was trying to get things formatted in the right way to pass between our separate parts of the project. In particular I was having issues with sending information to and from the AWS instance, but in the end was able to solve it with a lot of googling. I also worked on refactoring the code to be more editable and understandable, as well as on the final presentation.

Umang

This week I helped with the final presentation. Then, I decided to run a comparison between AlphaPose and OpenPose for running pose estimation on the AWS instance. We have a ~8 second up and down time that is irreducible but any other time is added due to the slow pose estimation. As such, I wanted to explore if OpenPose is faster for our use case on a GPU.  OpenPose runs smoothly but has much overhead if we want to retrofit it to optimize for our task. Running vanilla OpenPose led to a respectable estimation time for a 400 frame video (on one GPU), but was still over our desired metric. Though the times were comparable at first, when we added flags to our AlphaPose command to reduce the detection batch size, set the number of person to look for, and reduce the overall joint estimate confidence, we were able to get blazing fast estimation from AlphaPose (~20 seconds + ~8 seconds for the up down). This means we hit our metric of doing to end to video pose correction in under 30 seconds 🙂 Final touches to come next week!

Weekly Update #9 (4/14 – 4/20)

Team

After succeeding in finishing most of the work for the video corrections aspect of project, we decided that it was time to start integrating what we had done so far in order to get a better picture of how our project was going to look. Additionally, we realized that the amount of time that our corrections were taking was way too much for a user to justify. Therefore we wanted to find ways to speed up the pose processing. In order to do this, this we focused on:

  1. Looking into AWS as a way to boost our processing speeds
  2. Merging the existing pipelines with the UI and making it look decent

It’s also our in-lab demo next week, so we had to spend some time polishing up what our demo would look like. Since we only started integration this week, we still have problems to work through, so our in-lab demo will most likely not be fully connected or fully functional.

Kristina

This week I spent more time polishing up the UI and editing the implementation to show the recent changes made to our system. This involved being able to save a video or picture to files that could then be accessed by a script, being able to display the taken video or picture, being able to redo movement, and ensuring that the visualization on the screen acted as a mirror facing the user. The latter part was an important aspect that actually seems so small unless it doesn’t work and that we didn’t even realize until now. It’s so natural to look at a mirror; that’s how a dance class is conducted, and even more importantly, that’s how we’re used to seeing ourselves. Since our application is replacing dance class in a dance studio, it was important that the mirrored aspect of video and pictures worked. Also, because of our underestimate of the time necessary for each task, we realized that adding a text-to-speech element for the correction wasn’t the most necessary and we could replace it with a visualization, which would probably be more effective for the user since dance is a very visual art to learn.

Brian

Since I finished up on creating the frame matching algorithm, as well as helping with the pipelining last week, I decided to fine tune some of the things that we did to make it work more smoothly. Since we were only printing corrections on the terminal last week, I wanted to find a way to visualize the corrections in a way that made it apparent what the user needed to fix. In order to do this, I created functions to graph the user poses, and put them next to the instructor pose, with red circles highlighting the necessary correction. I also displayed the correction text in the same frame. I figured this would be an easy method to show all of the corrections in a way that would be easy to translate to the UI.

Umang

This week was all about speed up. Unfortunately, pose estimation using CPUs is horridly slow. We needed to explore ways to get our estimation under our desired metric of doing the pose estimation on a video in under 30 seconds. As such, I decided to explore running our pose estimation on a GPU where we would get the speedup we need to meet the metric. I worked on getting an initial pose estimation implementation of AlphaPose up on AWS. Similar to when AlphaPose runs locally, I run AlphaPose over a set of frames and give the resulting jsons to Brian to visualize as a graph. I also refactored a portion of the pipeline from last week to make it easier to ping the results from the json. The conflicting local file systems made this messy. I hope to compare pose estimation techniques (from GPUs) and continue to refactor code this next week.

Weekly Update #8 (4/7 – 4/13)

Team

This week we decided to have the video corrections finished by the end of the week before carnival. In order to do this, we focused on finishing up a couple things:

  1. Finishing and testing the frame matching algorithm to identify key frames within the user video
  2. Pipelining the data from when the video is taken until the corrections are made
  3. Creating UI pages to accommodate video capabilities

Kristina

With the data gathered from last week, I worked with Brian and Umang to identify the key points, or key frames, of each movement. For a lot of dance movements, especially in classical ballet which all of our chosen movements are taken from, there are positions that dancers move through every time which are important to perform the movement correctly. A dance teacher can easily identify and correct these positions that must always be hit when teaching a student. We take advantage of this aspect of dance in our frame matching algorithm in order to accommodate different speeds of videos and in our correction algorithm in order to give the user feedback. This is why you’ll probably hear us talk about “key frames” a lot when talking about this project. I also spent some time this week updating the UI to allow for video capture from the web camera. Unfortunately (for the time I have to work on capstone, fortunately for me personally!), Carnival weekend also means build week, so I  had a lot less time this week to work on capstone since I was always out on midway building/wiring my booth. I didn’t get as much of the UI implemented as I would have hoped, so I will be focusing on that a lot more next week.

Brian

This week I finished working on the frame matching algorithm. Since last week I focused on finding the distance metrics that yielded the best and most intuitive results, and decided on a simple l2 distance metric, this week I used this metric to actually match the frames. I started by converting the video to its angle domain, and then scanning the video with the key frame, calculating the distance at each point. Then simply by taking the minimum of this distance, I found the frame the best matched the key frame.

This method, however, has the issue that it may detect a frame in any part of the video, and does not take into account when the frame is in the video. In order to correct this, I calculated the positions of the top k most similar frames, and then went through in temporal order to find the best earliest match. Given n key frames, I would run this algorithm n times, each time only giving the frames that the algorithm hadn’t seen yet as frames to match to the keyframe.

Manually testing this on the keypoints that Kristina identified, we had an extremely high success rate in detecting the proper pose within a video.

Umang

This week was a short one due to Carnival. I worked on getting a end to end video pipeline up. Given a mp4 video, I was able to ingest it into a format that can be fed into AlphaPose locally and then sent the resulting jsons to be frame matched with the ground truth (which I also helped create this week). The ground truth was the amalgam of pose estimates from different ground truth videos that  Kristina captured (had to run this as a batch process before we started our pipeline so we would have access to the means and variances of the joints for a particular move). With the key frames identified (by Kristina), I was able to now provide corrections (after calling Brian’s frame matching algorithm); however, this process takes upwards of three minutes to run locally on my machine. As such, I need to explore ways to speed up the entire pipeline to optimize for our time metric.

Weekly Update #7 (3/31 – 4/6)

Team

After the midpoint demo this week, where we demoed running our correction algorithm on stills, we started work on corrections on videos. Before our design report and review, we had roughly designed our method to be able to match a user’s video to an instructor’s video from data collection regardless of the speed of the videos, but now we had to actually implement it. Like before, we spent some time adjusting the design of our original plan to account for problems encountered with the correction algorithm on stills before implementation. We will continue to work on frame matching and altering the correction algorithm to work with videos as well in the next week.

Kristina

I spent some time this week gathering more data since initially I just got data for the poses. I focused on taking videos of myself and a couple other dancers in my dance company doing a port de bras and a plie, which are the two moves we’ve decided to implement, but I also gathered more data for our poses as well (fifth position arms, first arabesque tendu, and a passe, since I realized I’ve never wrote the specific terms on the blog). Also the current UI is only set up for stills right now, so I spent a little bit of time redesigning it to work with videos as well. In the upcoming weeks, I hope to have a smoother version of the UI up and running.

Brian

I spent this first part of the week thinking of the best way to do corrections for videos. There were a couple of options that came to mind, but most of them were infeasible due to the amount of time that it takes to process pose estimations. Therefore, in the end we decided to correct videos by extending our image corrections to “Key Frames”. Key Frames are the poses within a video that we deem to be the defining poses necessary for the proper completion of the move. For example, in order for a push-up to be “proper”, the user must have proper form at both the to and the bottom. By isolating these frames and comparing them to the instructor’s top and bottom poses, we can correct the video.

In order to do this, we need to be able to match the instructor’s key frames to those of the user with a frame matching algorithm. I decided that it would be best to implement this matching by looking at the distance between the frame that we want to match and the corresponding user pose. This week Kristina and I experimented with a bunch of different distance metrics such a l1, l2, cosine, max, etc, and manually determined that the l2 distance yielded distances that most aligned with how similar Kristina judged 2 poses to be.

I will be using this metric to finalize the matching algorithm next week.

Umang

After a wonderful week in London, I spent this week working on video pipelining in particular I ironed out a script that allows us to locally to run estimation on images end to end and then started a pipeline to get pose estimates from a video (*.mp4 file), which would enable Brian’s frame matching algorithm to run. Working with him to devise a scheme to identify key frames made it a smooth process to run pose estimation locally: the problem was that certain frames were estimated incorrectly (due to glitches in the estimation API) and needed to be dropped. A more pressing issue is that pose estimation is really hard to run locally since it is so computationally expensive. I hope to complete the video pipeline and think about ways to speed this process up next week.

Weekly Update #6 (3/24 – 3/30)

Team

With the midpoint demo next week, this week was focused on getting our final elements that we want to show finished. We worked on the correction algorithm for stills, which will be run through a script for the demo, and on creating a UI to show what our final project will look like. In the upcoming weeks, we will work on fully connecting the front and back ends for a good user experience and a working project.

Kristina

Since this week was so busy for me, most of my work  will be front-loaded into the upcoming week to get it done by our actual midpoint demo. I’m working on creating a UI skeleton and a starting portion of the connection between the front end and back end. After the demo, I will start fully connecting the application together and integrating the text to speech element.

Brian

This week I worked on completing the necessary items for the initial demo. I was able to create a foundation for the pipeline that is able to take an image and output a correction within a couple of seconds. You can customize what moves you are trying to work on, as well as how many corrections you would like to receive at a time. The program will spit out the top things you need to work on in a text format. It will also draw a diagram of what your pose looked like, with circles over the areas that you need to correct. For next week, I would like to start working on corrections for videos, and work with the movement data that we will be collecting soon.

Umang

This week I worked with Brian to complete the demo; particularly, I worked on an end to end script that would run our demo from the command line, take a user captured image, give the necessary corrections to deviate back to the mean ground truth example. Based on the command entered, the user can denote which dance move they would like (and which version of the pretrained  model — fast or not). Next week, I will be traveling to London for my final grad school visit, but I will be thinking about the linear interpolation of how we will frame match for videos of dance moves. I also hope to leverage the increased training data to run the current pipeline with more fidelity.

Weekly Update #5 (3/17 – 3/23)

Team

This week, our focus is all on the midpoint demo. We spent some time this week deciding what we want to show as progress. After some discussion, we’ve decided to focus on the correction aspect of the project as opposed to the user experience and interaction with the application. We have an accurate joint estimation that we’re using to get the coordinates of the points, and have gotten line segments from that point, so we’ll have to focus on getting angles and correcting those angles in the upcoming weeks. The three of us unfortunately all have especially busy schedules in the upcoming weeks, so we are also making sure to schedule our working time so that we don’t get behind on the project.

Kristina

My main focus this week was gathering the data needed to establish the ground truth. We’ve decided that we want to gather data from multiple people, not just me, for testing purposes, so I’ll continue meeting with some other dance (and non-dance) friends to collect data into the beginning of next week. I will also help in testing our processing speed on a CPU vs a dedicated GPU to see if we should buy a GPU or update our application workflow. This upcoming week will probably be one of the busiest, if not the busiest, weeks of the semester for me, so I will focus on work for the demo and will continue work for my other portions of the project afterwards.

Brian

This week I focused on creating all of the functions necessary to process the data and extract the necessary information from it. I was able to create the general foundation that is able to take the images, extract the poses from them, and collect the angle distributions. I have also started creating our example pose collections for use in comparing with the user data. By next week, we would like to a having a working demo for still correction for 3 moves that is able to serve as a proof of concept for the following work on videos.

Umang

This week I focused on building out our core pipeline. I am able to convert an image (or a frame from a video) into a pose estimate using AlphaPose. Using those poses, I worked with Brian to calculate the angles between the limbs found on a given pose (as per our design document). Once Kristina collects the requisite data (stills of multiple people doing the same pose), we can get a ground truth distribution of the true form for three poses. By the midpoint demo day (4/1), we hope to extend the aforementioned to include the variance ranking, which would tell us which angle to correct. Thereafter, we hope to check whether we should use a GPU for pose estimation and we hope to develop our frame matching logic for video streams.

Weekly Update #4 (3/3-3/9)

Team

At the beginning of the week, we focused a lot on finalizing our design and completing the design document. After that was done, we worked on our individual portions of the project. Writing the design document took a lot more time than originally estimated, however, so we didn’t end up spending as much time on actual implementation of the project as we had previously hoped. Though since we didn’t make any big changes again this week and we had given some time for the design document, we believe that our work is still on track towards the midpoint demos.

Kristina

In addition to spending a lot of time working on the design document with Brian and Umang and working on other capstone assignments due (Ethics assignment), I started collecting the ground truth data. The first step in that is creating an application where I can perform a pose multiple times in front of my camera and the joint data will be saved in JSON format. Once I’m done creating that application, I will collect multiple instances of every pose that we are aiming to do. My goal is to have that complete in the next couple weeks so that we have data to test soon after we get back from Spring Break.

Brian

This week was spent working on the algorithms to detect the difference between the user poses and the instructor poses. I took some example json data to help with the task. I will continue working on this after Spring Break, and hope to finish that week on initial construction of the algorithm. I also worked on the ethics assignment, as well as further refinement of our design.

Umang

This week I worked on the angular domain calculations. How can we find a scale and shift invariant domain wherein we can compare pose estimates. Due to PhD visits during spring break, I won’t be able to contribute to this over the coming week. However, I hope to finish the transformation specifics by the end of the following week such that we have a rough draft of our pipeline by the first week in April.

Weekly Update #3 (2/24 – 3/2)

Team

This week was focused on the completion of the design, and thinking through many of the important aspects of our design. We didn’t make any sweeping changes that affect the course of the project, and are still on track. We just need to start implementing some of the planned ideas this week.

Kristina

This week, I worked with Brian and Umang on refining our design presentation as well as our design report. I didn’t work a ton on the actual implementation of the project, but helped with many design decisions as we finalize our design report. This upcoming week, I hope to finish gathering the expert data needed for the project.

Brian

Most of this week was spent on the design specifications for our project. We still had a couple of implementation details to think through, so that was the main concern. A big realization was that we need to transform the data into the angle space rather than look at the points themselves. This will allow us to account for the scaling and translation of the person within the image easily. Next week I would like to have an implementation for a still image running so we can move on to movement afterwards.

Umang

This week I had to travel for grad school visit days; nonetheless, I contributed to the design document development. Moreover, during a collaboration session with Brian, we realized that we need to map the limbs from Euclidean space to the angular domain. As such our feature vector would be six angles at the joints of a person. Using a rule based system, we can map each angle to a movement (given two angle feature vectors from adjacent frames) and then prompt the user with a directional correction phrase (after the Mozilla TTS). Next week, I hope to have built the angle feature vectors.

Weekly Update #2 (2/17 – 2/23)

Team

With the upcoming design presentation, we knew we had to make some important decisions. We’ve decided to use PoseNet and create a web application, which are two major changes from our original proposal. This is because we discovered that our original design, which was using OpenPose in a mobile application, would run very slowly. However, this change will not affect the overall schedule/timeline, as it is more of a lateral movement than a setback. Our decision to abandon the mobile platform could jeopardize our project; to adjust, we decided to offload processing to a GPU, which will make our project faster than it would have been on mobile.

Kristina

This week, I worked with Brian and Umang to test the limits of PoseNet so we could decide which joint detection model to use. I also started creating the base of our web application (just a simple Hello World application for now to build off of). I haven’t done any web development in a while, so creating the incredibly basic application was also a good way to review my rusty skills. Part of this was also trying to integrate PoseNet into the application, but I ran into installation issues (again…like last week. Isn’t set up like the worst part of any project) so I ended up just spending a lot of time trying to get TensorFlow.js and PoseNet on my computer. Also since this upcoming week is going to be a bit busier for me, I made a really simple, first-draft sketch of a UI design to start from. For this next week, my goals are to refine the design, create a simple application we can use to start gathering the “expert” data we need, and to start collecting the data.

Simple first draft of the UI design – very artistic right?!! I’m an aspiring stick figure artist.

Brian

This week I attempted to find the mobile version of openpose and have it run on an iphone. Similarly to last week, I ran into some issues during installation, and decided that since we already had a web version running, it was better to solidify our plan to create a webapp and trash the mobile idea.

Afterwards, I decided to get a better feel for the joint detection platform, and play around with tuning some of the parameters to see which ones yielded the best accuracy. This was mainly done by manual observation of the real time detection as I tracked the movement of what I assumed were dancelike movements. I also took a look at the raw output of the algorithm, and started thinking about the frame matching algorithm that we would like to use to account for the difference in speed amongst the user and training data. I also worked on creating the design documents. For the next week, I would like to work more with the data, and see if I can get something that can detect the difference between joints in given frames.

Umang

This week I worked with Brian to explore the platform options for our application. We found that mobile versions will be all to slow (2-3 fps without an speed-up to the processing) for our use case.  We then committed to making a web app instead. For the web version, we used a lite version of Google’s pretrained posenet (for real time estimation) to explore latency and estimation methods. With simple dance moves, I am able to get the estimate of twelve joints; however, when twirls, squats, or other scale/orientation variants are introduced, this light posenet variant loses estimates. As such, this coming week, I want to explore running the full posenet model on a prerecorded video. If we can do the pose estimation post hoc, then I can send the recorded video to an AWS instance with a gpu for quicker processing with the entire model and then send down the pose estimates.

I still need to work on the interpolation required to frame match the user’s video (or frame) with  our collected ground truth. To evade this problem, we are going to work stills of Kristina to generate a distribution over the ground truth. We can then query into this distribution at inference time to see how far the user’s joint deviates from the mean. I hope to have the theory and some preliminary results of this distributional aggregation within the next two weeks.