Team – Team F0: *wave* Google

Team Status Report for 04/26

Sung is doing good this week. The parts are ready for more extensive testing and demo. OpenPose on AWS with GPU runs a lot faster, and now we have more data thus more accurate training. OpenCV is a lost cause so goodbye OpenCV.

The presentation will be this week (good luck Sung)! We should have a lot of elements filmed by the end of this week for the video.

Team Status Report For 04/18

Hello,

This week we continued working on our individual portions, but we began focusing more heavily on getting more EC2 instances setup and to check performance of the 1 gpu instance. Our main limitation right now is a lack of training data and by getting this set up we can hopefully have up to 6 instances running to greatly speed up running OpenPose on our gesture images we have collected generating much more training data from the current 400ish. This will hopefully greatly improve our classifier performance.

Team Status Report For 04/11

Hello,

This week, we were able to demo our project. Jeff demoed the Web Application portion of the project, as well as image cropping to normalize images for a 2d convolutional neural network. Claire demoed the Hardware and scripts to test our project, and Sung demoed the OpenPose + svm classifier. Sung’s portion of the demo was not working as well as we expected it to be, so we will be fixing that this week. This week we will be fixing up more of our demo to make it work, as working on the 2d convolutional neural network and making AWS work.

We also worked on the ethics assignment and are now big ethics bois.

Also happy birthday Emily!!!!

Team Status Report for 03/28

We’re not sure what the purpose of the Team Status Report is now that we all have individual milestones. However, we have figured out what we each are going to do, and are currently working towards our individual goals. We have purchased more hardware to make working with the hardware feasible, and we have devised a plan to integrate our work together.

Jeff and Claire both have Jetson Nanos, that they are working with, and Sung will pass the OpenPose code/classification model to the both of them so that they can integrate it once they have their individual parts working.

Team Status Report For 03/21

Hello from Team *wave* Google!

This week we focused on getting resettled and refocusing our project given the switch to remote capstone. For the most part, our project is mostly intact with some small changes. We did cut out the physical enclosure for our project, given TechSpark closing, but this was not an essential part of project, and we eliminated live testing instead focusing solely on video streams of gestures, hopefully that can be gathered remotely from asking friends.

To facilitate remote capstone, we worked to segment our project into stages that we each could work remotely on. We narrowed down the inputs and output of each stage so that one person would not rely on another. For example, we determined that the input for OpenPose would be images and that the output would be positional distances from the wrist point to all the respective points as a JSON, something that OpenCV would also output in the future. We also set up Google Assistant SDK so that the text inputs and outputs work and are determined. These inputs and outputs will also be the inputs to our web application This will allow us to do pipeline testing at each stage.

Finally, we decided to also to order another Jetson Nano given we have enough budget, which eliminate another dependency as OpenCV can be tested directly on this new Nano.

More detail on the refocused project is on our document on Canvas.

PS: We also wish our team member Sung a good flight back to Korea where he will be working remotely for the rest of the semester

Team Status Report for 03/07

This week, we started going deeper into the machine learning aspect of things. After some experimentation with OpenPose on the Nano, it became abundantly clear that if we want to meet our speed requirements we definitely should not run it locally. It’s good to know this early on – now we know that AWS EC2 is definitely the only way forward if we want to keep our current design of utilizing both OpenPose and OpenCV.

We also found out that OpenPose doesn’t recognize the back of hands, especially gestures where the fingers are not visible (like a closed fist with the back of the hand facing the camera). We are going to re-map some of our gestures so that we know that each gesture is, at minimum, recognized by OpenPose. This would greatly reduce the risk of a gesture that is never recognized later on, or the need for additional machine learning algorithms incorporated into the existing infrastructure.

(OpenPose can detect the hand backwards, but cannot do the same with a fist backwards)

We are quickly realizing the limitations of the Nano and seriously reconsidering changing to the Xavier. We are in contact with our advisor about this situation, and he is ready to order an Xavier for us if need be. Within the next two weeks, we can probably make a firmer decision on how to proceed. So far, only the CPU has shown serious limitations (overheating while running basic install commands, running OpenPose, etc.). Once OpenCV is installed and running, we can make a more accurate judgement.

Team Status Report For 02/29

Hello from Team *wave* Google!

This week we presented our design report (great job Claire!!) and worked on our design report. After our design report, we received useful feedback regarding confusion matrixes, something that would be useful for us to add onto our existing design. We had already decided to individually classify the accuracy of all our gestures, and by combining that information with a confusion matrix, we can hope to achieve better results.

Another important feedback that we received goes along with one the bigger risks of our project that has put as a bit behind schedule, hardware. This week all our hardware components that we ordered finally arrived, allowing us to fully access the Jetson Nano’s capabilities. We had already determined that OpenPose was unlikely to successfully run the Nano given the performance of other groups on the Xavier, and we have thus choose to minimize the dependencies on the Nano instead running on a p2 EC2 instance. We should be able to know much more confidently next week if OpenCV will have acceptable performance on the Nano, and if not we will strongly consider pivoting to TK or Xavier.

Regarding the other component of the project, the Google Assistant SDK and Web Application, we have made good progress on that figuring out the way to link up to two using simple Web Sockets. We know that we can get the text response from Google Assistant and using a Web Socket connection relay that information to the Web Application. Further experimentation next week, will determine in more detail the scope and capabilities of Google Assistant SDK.

All in all, we are a bit behind schedule which is exacerbated by Spring Break approaching. However, we still have a good amount of slack and with clear tasks next week we hope to make good progress before Spring Break.

Team Status Report for 02/22

Hello from Team *wave* Google!

This week, we worked a lot on finalizing our idea/concepts in preparation for the design review. In terms of the gesture recognition software we were planning on using, we decided to use both OpenPose and OpenCV to mitigate the risk of misclassifying and/or not classifying at all. Initially, we were planning on only using OpenCV and have users wear a glove to track joints. However, we weren’t sure how reliable this would be so to mitigate the risk of misclassifying/not classifying, we added a backup, which is to run OpenPose to get joint locations, and to use that data to classify gestures. With this new approach, we will have OpenCV running on the nano and OpenPose running on AWS. Gestures in the form of video will be split into different frames, and those frames will be tracked with glove tracking on the Nano and OpenPose tracking on AWS. If we don’t get a classification on the Nano and OpenCV, we will use the result from AWS and OpenPose to classify our gestures.

We wanted to see how fast OpenPose could run as how fast we can classify gestures is a requirement we have, and on a MacBook Pro, we achieved a 0.5 fps with video and tracking one image took around 25-30 seconds. Now OpenPose on the MacBook Pro was running on a CPU, whereas it would run on a GPU on the Nvidia Jetson Nano. Even still, the fact that it ran 25-30 seconds on a CPU to track one image meant that it would be possible that OpenPose would not deliver our timing requirements. As such, we decided to use AWS to run OpenPose instead. This should mitigate the risk of classification being too slow by using OpenPose.

Another challenge we ran into was processing dynamic gestures. Processing dynamic gestures would mean that we would have to do video recognition to do our gesture recognition. We researched online and found that most video recognition algorithms rely on 3D-CNN’s to train/test because of the high accuracy that 3D-CNN provides compared to 2D-CNN. However, given that we need fast response/classification times, we decided not to do dynamic gestures as we thought they would be hard to implement with the time constraints that we are working with. Instead we decided to have a set of static gestures and only do recognition on those static gestures.

We’ve also modified our Gantt Chart to update the change in design choices, especially with the gesture recognition aspect of our project.

Next week, we are going to run OpenPose on AWS and start the feature extraction with the results we get from OpenPose tracking so that we can start training our model soon.

Team Status Update for 02/15

Hello from Team *wave* Google! Right now, the most significant risk is in the performance of the Jetson Nano. We are not sure if the Nano has the amount of computational power we need to complete both the feature extraction and the gesture recognition. If the Nano proves to be insufficient, we will need to quickly pivot to another piece of hardware, likely the Jetson TX1 or Jetson Xavier. We will try to get the Jetson Nano demo-ing as soon as possible in order to test if it has what it takes. We can do hardware testing and network training in parallel, as those two tasks don’t depend on each other. The gesture recognition demo on the Nano we saw online used OpenCV, but we want to also use OpenPose, which we are not sure if we can run on the Nano yet. This could greatly complicate our project, and the only way to mitigate is to start only.

We had to change how we implement the gesture recognition aspect of this project. We originally thought that we would’ve been able to implement our own skeletal tracking, but upon talking to the professors and reading up papers on skeletal tracking, we realized that implementing skeletal tracking from scratch would be way too hard. Thus we have two alternative approaches. The first approach is to use OpenPose and train a model with pre-processed datasets, and use skeletal tracking provided by openpose to classify gestures. The other approach is to use OpenCV and have our users possibly wear a special glove. This glove would have joints labeled, and we would use OpenCV to imitate skeletal tracking and classify gestures.

Finally, our Gantt chart and overall schedule has changed as a result of the longer than expected time for parts to arrive. This results in us not being able to completely setup the Jetson Nano and run OpenCV and OpenPose on the Nano in our initial time frame. Instead, we are forced to wait until our parts arrive, and instead run it first on the laptop. Also using OpenPose on the laptop proved more difficult than expected and would carry onto the next week.