March 2020 – Team F0: *wave* Google

Team Status Report for 03/28

We’re not sure what the purpose of the Team Status Report is now that we all have individual milestones. However, we have figured out what we each are going to do, and are currently working towards our individual goals. We have purchased more hardware to make working with the hardware feasible, and we have devised a plan to integrate our work together.

Jeff and Claire both have Jetson Nanos, that they are working with, and Sung will pass the OpenPose code/classification model to the both of them so that they can integrate it once they have their individual parts working.

Jeff’s Status Report For 03/28

Hello World!!!! hahaha am i funny too

This week I continued to work on refining the design of our web application. In addition, I began to focus more heavily on the OpenCV glove tracking component. To start rather than use all 20 markers that OpenPose has, I am focusing initially on 6, 5 on the palm knuckles and 1 on the wrist. (the glove is a clear latex glove). I was able to get overall body and hand detection and the segmentation of the body and hand from the background working, although it is not perfect at this time and I will continue to work on it. Furthermore, when doing the markers, I experimented with different ways of doing it from creating square markers and doing edges, but the markers lacked a clear enough edge for it to work super well, as well as using SURF, which did a sort of blob detection, but performance was not super great, but the markers lacked enough contrast. Finally, the best solution so far was to use color detection to detect the color of the markers which was red, which worked for the knuckles but not so much of the wrist. The color of my hand is kind of pinkish, which caused less ideal results on the wrist marker. Hopefully, when the green marker ordered comes, performance will be much better.

My SOW and Gantt Chart was emailed to Emily earlier this week!

Claire’s Status Report for 03/28

This week, I managed to get the basic functionality of the randomly generated testing working. I have a Python program that can read images from a directory and create videos with it through OpenCV. I knew that it was very likely that this functionality is possible in Python, but it was good to see it working.

Over the next two days I want to refine the script in terms of range of possible inputs for the test. For example, I want to see what range of words I will deem “acceptable” as a part of the command for Google. In particular, I think the “what is?” command (not its own designated command, but a feature we plan on fully implementing) would be the hardest to execute correctly with good inputs. For example, we want to eliminate words like “it”, “she”, “her” – special words that hold no meaning in a “what is” context. It would be nice to include some proper nouns too. These are all just for interest and proof of concept (since no one will be actually using our product, we want to show that it is functional as a normal Google Home would be).

Another concern that came up is the spacing. After some thought, I think it would make sense to put some spacing of “white space” between gestures, as a normal person signing would be. Someone who actually signs in real life will probably do one letter then switch to the next one in less than a second. This spacing could be important – this will help us distinguish repeating letters. I didn’t think of this before, but now that I have I think I will put some randomly timed breaks where the video is just blank (need to explore that in the next few days as well) to imitate that. This could great improve our accuracy and simulation of a real-world situation.

Sung’s Status Report for 03/28

Hello world!! ahahah i’m so funny

This week, I worked a lot of trying to collect data for our project. There is a bottleneck in collecting data, as OpenPose segfaults when I run it with an image directory of more than 50 images. This means that in order for me to train more than 50 images (I would ideally like 100 images per gesture), I need to rerun OpenPose with a new directory of images. 50 images take around 30 minutes to finish, which means that I need to check every 30 minutes and the fact that I have to be awake makes this process a little bit slower than I expected it to be.

Other than data collection, I’ve been working on the classification model for our project. I’ve been working on using a pretrained network, and I am trying to integrate that into our project. I have found some examples where they use pretrained networks to train their new data and model, so I am trying to implement that into this project.

Restructured SOW Sung

Here is my restructured SOW with my Gantt chart for further descriptions.

Team Status Report For 03/21

Hello from Team *wave* Google!

This week we focused on getting resettled and refocusing our project given the switch to remote capstone. For the most part, our project is mostly intact with some small changes. We did cut out the physical enclosure for our project, given TechSpark closing, but this was not an essential part of project, and we eliminated live testing instead focusing solely on video streams of gestures, hopefully that can be gathered remotely from asking friends.

To facilitate remote capstone, we worked to segment our project into stages that we each could work remotely on. We narrowed down the inputs and output of each stage so that one person would not rely on another. For example, we determined that the input for OpenPose would be images and that the output would be positional distances from the wrist point to all the respective points as a JSON, something that OpenCV would also output in the future. We also set up Google Assistant SDK so that the text inputs and outputs work and are determined. These inputs and outputs will also be the inputs to our web application This will allow us to do pipeline testing at each stage.

Finally, we decided to also to order another Jetson Nano given we have enough budget, which eliminate another dependency as OpenCV can be tested directly on this new Nano.

More detail on the refocused project is on our document on Canvas.

PS: We also wish our team member Sung a good flight back to Korea where he will be working remotely for the rest of the semester

Jeff’s Status Report For 03/21

This week, I worked with Claire and Sung on refocusing our project so that we could work with little physical interaction. We also fixed our gesture list after noticing from before Spring Break that some gestures that did not show the wrist or showed only the back of the hand were not recognized by OpenPose.

Furthermore, I continued to work on the web application. I set up Docker to run Redis, which will allow the channel layer. This will allow multiple consumers to connect to the web socket and send information, ie the command that our algorithm has recognized as well as the response from Google Assistant SDK.

In addition, I began to get myself familiarized with OpenCV which is to be used in conjunction with our designed glove for a less computational intensive alternative to OpenPose. I began experimented with OpenCV and marker tracking, which is something I will continue next week. The glove currently is simply a latex glove with a marker indicating the key points. I may switch to using a more permanent marker like tape in the future.

Sung’s Status Report for 03/21

So this past week and over spring break, I was focused on normalizing the data that we collected. I normalized the data with the following. With each hand, OpenPose returned a 63 feature list of (x,y,autoscore) components of 21 hand points. With the (x,y) points, I would normalize each of those points relative to a sample hand we designated as our reference hand. With that, I then calculated the relative distance from the reference hand base (the palm) to every other reference hand point. As such, I had 20 reference distances from the base of the hand to other points of the hand. Using that, I made sure every other hand OpenPose recognized was scaled so that the distances of the hands were the same. I used some trigonometry to preserve the angle of the various points in the hands while scaling the distance.

With the new normalized data, I am trying to collect as much data as possible. I looked into some pretrained models I could use to make me train faster, but I am completely not sure how to integrate any pretrained models to work with the specific feature set that we have. As such, I’m still researching more into pretrained models. This is because in order for neural networks to work well, we need a really large training data set, which is particularly hard because OpenPose takes a long time to actually give the output list of 63 features (2 minutes for one image), and there is no guarantee that one image is good enough for OpenPose to actually use the hand tracker.

That being said, this week was a little bit tough for me because I had to move out and I was working on figuring out where I was going to be for the rest of the semester. However, once I move to Korea next week, I expect things to be smoother.

Claire’s Status Report for 03/21

Successfully installed WiFi card for Jetson Nano and tested its speed. The upload speed is around 15 Mbps, which is a little low for what we want to do. This could be a little challenging for the AWS interaction in the future, but we will have to see.

AWS account is set up but instances aren’t started yet so we don’t waste money.

Google Assistant SDK now fully installed on the Jetson Nano as well. Text input and text outputs are both verified to work, but as separate modules. The sample code will need some tweaking in order to integrate both, but this is promising so far. I also found a way to “trigger” the Google Assistant, so we can combine that with the hand-waving motion in order to start the Google Assistant. That might not be completely necessary. Here is the repo for the work being done on that end.

Next week, I will have text input and output into one module. Right now, it is streaming from terminal but I will also add a functionality that allows it to read from files instead (which is how we are going to have deciphered outputs from the gesture recognition algorithms).

Jeff’s Status Report For 03/14

Given spring break and coronavirus, I did not do much this week for capstone.

Claire’s Status Report for 03/07

This week has been very unfortunate with all the midterms and projects I had due, so I wasn’t able to work on Capstone meaningfully. Aside from things that I got done last week coming to fruition this past week (e.g. requesting full access to Google Cloud platform within andrew.cmu.edu), I didn’t really start anything new. I will be traveling during spring break, but here are some deliverable tasks that I can achieve remotely:

Getting Google Assistant SDK sample to run on my ASUS computer
Altering sample code to take in text queries and pipe text outputs (and if not, at least determine how to make them into text)
Explore the need for other smart home SDK’s (Alexa?) if Google Assistant SDK is too difficult
Re-map some gestures to be more recognizable (working with Sung who will run it through OpenPose)