Team F0: *wave* Google – Page 3 – Carnegie Mellon ECE Capstone, Spring 2020

Sung’s Status Report for 03/28

Hello world!! ahahah i’m so funny

This week, I worked a lot of trying to collect data for our project. There is a bottleneck in collecting data, as OpenPose segfaults when I run it with an image directory of more than 50 images. This means that in order for me to train more than 50 images (I would ideally like 100 images per gesture), I need to rerun OpenPose with a new directory of images. 50 images take around 30 minutes to finish, which means that I need to check every 30 minutes and the fact that I have to be awake makes this process a little bit slower than I expected it to be.

Other than data collection, I’ve been working on the classification model for our project. I’ve been working on using a pretrained network, and I am trying to integrate that into our project. I have found some examples where they use pretrained networks to train their new data and model, so I am trying to implement that into this project.

Restructured SOW Sung

Here is my restructured SOW with my Gantt chart for further descriptions.

Team Status Report For 03/21

Hello from Team *wave* Google!

This week we focused on getting resettled and refocusing our project given the switch to remote capstone. For the most part, our project is mostly intact with some small changes. We did cut out the physical enclosure for our project, given TechSpark closing, but this was not an essential part of project, and we eliminated live testing instead focusing solely on video streams of gestures, hopefully that can be gathered remotely from asking friends.

To facilitate remote capstone, we worked to segment our project into stages that we each could work remotely on. We narrowed down the inputs and output of each stage so that one person would not rely on another. For example, we determined that the input for OpenPose would be images and that the output would be positional distances from the wrist point to all the respective points as a JSON, something that OpenCV would also output in the future. We also set up Google Assistant SDK so that the text inputs and outputs work and are determined. These inputs and outputs will also be the inputs to our web application This will allow us to do pipeline testing at each stage.

Finally, we decided to also to order another Jetson Nano given we have enough budget, which eliminate another dependency as OpenCV can be tested directly on this new Nano.

More detail on the refocused project is on our document on Canvas.

PS: We also wish our team member Sung a good flight back to Korea where he will be working remotely for the rest of the semester

Jeff’s Status Report For 03/21

This week, I worked with Claire and Sung on refocusing our project so that we could work with little physical interaction. We also fixed our gesture list after noticing from before Spring Break that some gestures that did not show the wrist or showed only the back of the hand were not recognized by OpenPose.

Furthermore, I continued to work on the web application. I set up Docker to run Redis, which will allow the channel layer. This will allow multiple consumers to connect to the web socket and send information, ie the command that our algorithm has recognized as well as the response from Google Assistant SDK.

In addition, I began to get myself familiarized with OpenCV which is to be used in conjunction with our designed glove for a less computational intensive alternative to OpenPose. I began experimented with OpenCV and marker tracking, which is something I will continue next week. The glove currently is simply a latex glove with a marker indicating the key points. I may switch to using a more permanent marker like tape in the future.

Sung’s Status Report for 03/21

So this past week and over spring break, I was focused on normalizing the data that we collected. I normalized the data with the following. With each hand, OpenPose returned a 63 feature list of (x,y,autoscore) components of 21 hand points. With the (x,y) points, I would normalize each of those points relative to a sample hand we designated as our reference hand. With that, I then calculated the relative distance from the reference hand base (the palm) to every other reference hand point. As such, I had 20 reference distances from the base of the hand to other points of the hand. Using that, I made sure every other hand OpenPose recognized was scaled so that the distances of the hands were the same. I used some trigonometry to preserve the angle of the various points in the hands while scaling the distance.

With the new normalized data, I am trying to collect as much data as possible. I looked into some pretrained models I could use to make me train faster, but I am completely not sure how to integrate any pretrained models to work with the specific feature set that we have. As such, I’m still researching more into pretrained models. This is because in order for neural networks to work well, we need a really large training data set, which is particularly hard because OpenPose takes a long time to actually give the output list of 63 features (2 minutes for one image), and there is no guarantee that one image is good enough for OpenPose to actually use the hand tracker.

That being said, this week was a little bit tough for me because I had to move out and I was working on figuring out where I was going to be for the rest of the semester. However, once I move to Korea next week, I expect things to be smoother.

Claire’s Status Report for 03/21

Successfully installed WiFi card for Jetson Nano and tested its speed. The upload speed is around 15 Mbps, which is a little low for what we want to do. This could be a little challenging for the AWS interaction in the future, but we will have to see.

AWS account is set up but instances aren’t started yet so we don’t waste money.

Google Assistant SDK now fully installed on the Jetson Nano as well. Text input and text outputs are both verified to work, but as separate modules. The sample code will need some tweaking in order to integrate both, but this is promising so far. I also found a way to “trigger” the Google Assistant, so we can combine that with the hand-waving motion in order to start the Google Assistant. That might not be completely necessary. Here is the repo for the work being done on that end.

Next week, I will have text input and output into one module. Right now, it is streaming from terminal but I will also add a functionality that allows it to read from files instead (which is how we are going to have deciphered outputs from the gesture recognition algorithms).

Jeff’s Status Report For 03/14

Given spring break and coronavirus, I did not do much this week for capstone.

Claire’s Status Report for 03/07

This week has been very unfortunate with all the midterms and projects I had due, so I wasn’t able to work on Capstone meaningfully. Aside from things that I got done last week coming to fruition this past week (e.g. requesting full access to Google Cloud platform within andrew.cmu.edu), I didn’t really start anything new. I will be traveling during spring break, but here are some deliverable tasks that I can achieve remotely:

Getting Google Assistant SDK sample to run on my ASUS computer
Altering sample code to take in text queries and pipe text outputs (and if not, at least determine how to make them into text)
Explore the need for other smart home SDK’s (Alexa?) if Google Assistant SDK is too difficult
Re-map some gestures to be more recognizable (working with Sung who will run it through OpenPose)

Sung’s Status Report for 03/07

This week, I was working on making the framework for the machine learning aspect of OpenPose. I collected about 100 images of data, and I was trying to write the feature extraction.

One way I am doing feature extraction is getting all the 21 joint locations of the hands provided by OpenPose. However, I needed a way to normalize the images of hands, as I wanted the hands to be of the same relative size, regardless of the photo I took. This meant that I needed to calculate angles of joints and take the length and change it to some relative length, while preserving the angle. I am currently in the process of writing the feature extraction. After that, I will continue to work on the machine learning portion of the project. I plan to use a neural network, and I am going to train my model by the raw model itself, and also train it using a pre-trained model. Marios told me that a pre-trained model would adjust more finely to the changes, which would help preserve the accuracy of gestures.

Team Status Report for 03/07

This week, we started going deeper into the machine learning aspect of things. After some experimentation with OpenPose on the Nano, it became abundantly clear that if we want to meet our speed requirements we definitely should not run it locally. It’s good to know this early on – now we know that AWS EC2 is definitely the only way forward if we want to keep our current design of utilizing both OpenPose and OpenCV.

We also found out that OpenPose doesn’t recognize the back of hands, especially gestures where the fingers are not visible (like a closed fist with the back of the hand facing the camera). We are going to re-map some of our gestures so that we know that each gesture is, at minimum, recognized by OpenPose. This would greatly reduce the risk of a gesture that is never recognized later on, or the need for additional machine learning algorithms incorporated into the existing infrastructure.

(OpenPose can detect the hand backwards, but cannot do the same with a fist backwards)

We are quickly realizing the limitations of the Nano and seriously reconsidering changing to the Xavier. We are in contact with our advisor about this situation, and he is ready to order an Xavier for us if need be. Within the next two weeks, we can probably make a firmer decision on how to proceed. So far, only the CPU has shown serious limitations (overheating while running basic install commands, running OpenPose, etc.). Once OpenCV is installed and running, we can make a more accurate judgement.

Jeff’s Status Report For 03/07

This week I continued to work on the web application, working again on setting up the channel layer and web socket connections. I also decided to work more on setting up the Jetson Nano to run OpenPose and OpenCV, as finalizing the web application was less important than catching up to the gesture recognition parts of the project.

Getting OpenPose installed on the Jetson Nano was mostly smooth, but had some hiccups on the way from errors in the installation guide that I was able to solve with the help of some other groups that installed on the Xavier. I was also able to install OpenCV which went smoothly. After installing OpenPose, I tried to get video streaming working to test the FPS we would get after finally getting our camera, but I had difficulties getting that setup. Instead, I just experimented with running OpenPose in a similar fashion as Sung had been doing on his laptop. Initial results are not very promising, but I am not sure if OpenPose was making full use of the GPU.

Next week is spring break, so I do not anticipate doing much, but after break I hope to continue to work on the Nano and begin OpenCV + glove part.