Weekly Update #6 (3/24 – 3/30)

Team

With the midpoint demo next week, this week was focused on getting our final elements that we want to show finished. We worked on the correction algorithm for stills, which will be run through a script for the demo, and on creating a UI to show what our final project will look like. In the upcoming weeks, we will work on fully connecting the front and back ends for a good user experience and a working project.

Kristina

Since this week was so busy for me, most of my work  will be front-loaded into the upcoming week to get it done by our actual midpoint demo. I’m working on creating a UI skeleton and a starting portion of the connection between the front end and back end. After the demo, I will start fully connecting the application together and integrating the text to speech element.

Brian

This week I worked on completing the necessary items for the initial demo. I was able to create a foundation for the pipeline that is able to take an image and output a correction within a couple of seconds. You can customize what moves you are trying to work on, as well as how many corrections you would like to receive at a time. The program will spit out the top things you need to work on in a text format. It will also draw a diagram of what your pose looked like, with circles over the areas that you need to correct. For next week, I would like to start working on corrections for videos, and work with the movement data that we will be collecting soon.

Umang

This week I worked with Brian to complete the demo; particularly, I worked on an end to end script that would run our demo from the command line, take a user captured image, give the necessary corrections to deviate back to the mean ground truth example. Based on the command entered, the user can denote which dance move they would like (and which version of the pretrained  model — fast or not). Next week, I will be traveling to London for my final grad school visit, but I will be thinking about the linear interpolation of how we will frame match for videos of dance moves. I also hope to leverage the increased training data to run the current pipeline with more fidelity.

Weekly Update #5 (3/17 – 3/23)

Team

This week, our focus is all on the midpoint demo. We spent some time this week deciding what we want to show as progress. After some discussion, we’ve decided to focus on the correction aspect of the project as opposed to the user experience and interaction with the application. We have an accurate joint estimation that we’re using to get the coordinates of the points, and have gotten line segments from that point, so we’ll have to focus on getting angles and correcting those angles in the upcoming weeks. The three of us unfortunately all have especially busy schedules in the upcoming weeks, so we are also making sure to schedule our working time so that we don’t get behind on the project.

Kristina

My main focus this week was gathering the data needed to establish the ground truth. We’ve decided that we want to gather data from multiple people, not just me, for testing purposes, so I’ll continue meeting with some other dance (and non-dance) friends to collect data into the beginning of next week. I will also help in testing our processing speed on a CPU vs a dedicated GPU to see if we should buy a GPU or update our application workflow. This upcoming week will probably be one of the busiest, if not the busiest, weeks of the semester for me, so I will focus on work for the demo and will continue work for my other portions of the project afterwards.

Brian

This week I focused on creating all of the functions necessary to process the data and extract the necessary information from it. I was able to create the general foundation that is able to take the images, extract the poses from them, and collect the angle distributions. I have also started creating our example pose collections for use in comparing with the user data. By next week, we would like to a having a working demo for still correction for 3 moves that is able to serve as a proof of concept for the following work on videos.

Umang

This week I focused on building out our core pipeline. I am able to convert an image (or a frame from a video) into a pose estimate using AlphaPose. Using those poses, I worked with Brian to calculate the angles between the limbs found on a given pose (as per our design document). Once Kristina collects the requisite data (stills of multiple people doing the same pose), we can get a ground truth distribution of the true form for three poses. By the midpoint demo day (4/1), we hope to extend the aforementioned to include the variance ranking, which would tell us which angle to correct. Thereafter, we hope to check whether we should use a GPU for pose estimation and we hope to develop our frame matching logic for video streams.

Weekly Update #4 (3/3-3/9)

Team

At the beginning of the week, we focused a lot on finalizing our design and completing the design document. After that was done, we worked on our individual portions of the project. Writing the design document took a lot more time than originally estimated, however, so we didn’t end up spending as much time on actual implementation of the project as we had previously hoped. Though since we didn’t make any big changes again this week and we had given some time for the design document, we believe that our work is still on track towards the midpoint demos.

Kristina

In addition to spending a lot of time working on the design document with Brian and Umang and working on other capstone assignments due (Ethics assignment), I started collecting the ground truth data. The first step in that is creating an application where I can perform a pose multiple times in front of my camera and the joint data will be saved in JSON format. Once I’m done creating that application, I will collect multiple instances of every pose that we are aiming to do. My goal is to have that complete in the next couple weeks so that we have data to test soon after we get back from Spring Break.

Brian

This week was spent working on the algorithms to detect the difference between the user poses and the instructor poses. I took some example json data to help with the task. I will continue working on this after Spring Break, and hope to finish that week on initial construction of the algorithm. I also worked on the ethics assignment, as well as further refinement of our design.

Umang

This week I worked on the angular domain calculations. How can we find a scale and shift invariant domain wherein we can compare pose estimates. Due to PhD visits during spring break, I won’t be able to contribute to this over the coming week. However, I hope to finish the transformation specifics by the end of the following week such that we have a rough draft of our pipeline by the first week in April.

Weekly Update #2 (2/17 – 2/23)

Team

With the upcoming design presentation, we knew we had to make some important decisions. We’ve decided to use PoseNet and create a web application, which are two major changes from our original proposal. This is because we discovered that our original design, which was using OpenPose in a mobile application, would run very slowly. However, this change will not affect the overall schedule/timeline, as it is more of a lateral movement than a setback. Our decision to abandon the mobile platform could jeopardize our project; to adjust, we decided to offload processing to a GPU, which will make our project faster than it would have been on mobile.

Kristina

This week, I worked with Brian and Umang to test the limits of PoseNet so we could decide which joint detection model to use. I also started creating the base of our web application (just a simple Hello World application for now to build off of). I haven’t done any web development in a while, so creating the incredibly basic application was also a good way to review my rusty skills. Part of this was also trying to integrate PoseNet into the application, but I ran into installation issues (again…like last week. Isn’t set up like the worst part of any project) so I ended up just spending a lot of time trying to get TensorFlow.js and PoseNet on my computer. Also since this upcoming week is going to be a bit busier for me, I made a really simple, first-draft sketch of a UI design to start from. For this next week, my goals are to refine the design, create a simple application we can use to start gathering the “expert” data we need, and to start collecting the data.

Simple first draft of the UI design – very artistic right?!! I’m an aspiring stick figure artist.

Brian

This week I attempted to find the mobile version of openpose and have it run on an iphone. Similarly to last week, I ran into some issues during installation, and decided that since we already had a web version running, it was better to solidify our plan to create a webapp and trash the mobile idea.

Afterwards, I decided to get a better feel for the joint detection platform, and play around with tuning some of the parameters to see which ones yielded the best accuracy. This was mainly done by manual observation of the real time detection as I tracked the movement of what I assumed were dancelike movements. I also took a look at the raw output of the algorithm, and started thinking about the frame matching algorithm that we would like to use to account for the difference in speed amongst the user and training data. I also worked on creating the design documents. For the next week, I would like to work more with the data, and see if I can get something that can detect the difference between joints in given frames.

Umang

This week I worked with Brian to explore the platform options for our application. We found that mobile versions will be all to slow (2-3 fps without an speed-up to the processing) for our use case.  We then committed to making a web app instead. For the web version, we used a lite version of Google’s pretrained posenet (for real time estimation) to explore latency and estimation methods. With simple dance moves, I am able to get the estimate of twelve joints; however, when twirls, squats, or other scale/orientation variants are introduced, this light posenet variant loses estimates. As such, this coming week, I want to explore running the full posenet model on a prerecorded video. If we can do the pose estimation post hoc, then I can send the recorded video to an AWS instance with a gpu for quicker processing with the entire model and then send down the pose estimates.

I still need to work on the interpolation required to frame match the user’s video (or frame) with  our collected ground truth. To evade this problem, we are going to work stills of Kristina to generate a distribution over the ground truth. We can then query into this distribution at inference time to see how far the user’s joint deviates from the mean. I hope to have the theory and some preliminary results of this distributional aggregation within the next two weeks.

Weekly Update #1 (2/10 – 2/16)

Team

The most significant risks that we face are not finding a reliable implementation of an algorithm that can detect poses in real time. Thought we think we have found something reliable, we still need to test in the upcoming week whether or not it will be sufficient. In order to manage this, we are thinking up an alternate plan that will instead of processing in “real time”, allow for a buffer of a couple of seconds for the processing to complete. It will only be a slight alteration though, and seems to be something we can deal with pretty easily. The main change that was made this week was the shift from mobile to web platforms. As I mentioned, there may be slight alterations coming later when it comes to the “real time” aspect of our project, but both of these changes were things that we had mentally accounted for earlier, and therefore will not change the overall trajectory of our project. The schedule has not changed, except for a slight shift due to unexpected errors that occurred during the installation of the pose detection algorithm.

Kristina

This week, I tried getting OpenPose set up on my personal laptop and did some research into how to use OpenCV/OpenPose on mobile. Through reading different articles and projects, and testing out some trial apps, I didn’t find a solution to use OpenPose with even close to real-time latency on mobile, so we all decided to just try getting OpenPose running on our laptops first. Since I have a PC and a possible Ubuntu VM to use while Brian and Umang have Macs, we tried different setups, but all ran into the problem of not having a Nvidia graphics card (my one screenshot just shows one of the error messages while trying to run a simple OpenPose Windows demo). We realized we would have to spend more time next week looking into other options, possibly changing the scope of our project, so we had to change the schedule a bit. We still need to do more work in figuring out what joint detection we want to use in order to determine whether we can still make a mobile app or if we should make a web application. I was supposed to start with the UI design at the end of this week, but I couldn’t due to OpenPose problems we encountered. The goal is that next week we’ll have this figured out and a simple UI design.

Brian

This week I focused on getting any implementation of a program to obtain poses from images running. At first we wanted to have a program that was able to run on mobile. However, after facing problems with that, we decided to work on finding something to run as a webapp. We had settled on openpose, but after a couple hours of troubleshooting installation errors, I found out that the devices we have access to do not have the capabilities to run or even install the program. Also I didn’t have the necessary permissions on the external devices to download the requirements. After failing for a bit, I was able to find another implementation that was able to run with pretty good framerate (~15-18) on my laptop.

They say that there is an implementation for mobile, so for next week, I would like to have this running there. Otherwise, we will proceed with a webapp. I would also like to get access to the point data from this app, and start playing around with comparing poses in different images.

Umang

This week I worked on getting a tensorflow implementation of openpose running. I quickly learned that the framerate of mobilenet (https://arxiv.org/pdf/1704.04861.pdf) would not be conducive to dance pose estimation on mobile devices: 3.23 fps is too slow for real time processing. See figure at the end of the document for me attempting to dance. Mobilenet_fast may provide us with the latency if we run it on a NVIDIA Jetson TX2. I started the rank aggregation framework we need to aggregate the poses Kristina gathers (this does not need to happen in real-time but we need a prioritization framework to pick the joints to correct).

In the coming week, I hope to assess the pretrained estimation networks we can use and begin to attempt naive aggregation. Then, I also hope to also work through the rank aggregation framework for the ground truth such that when Kristina has the data, my framework can be applied.