Team Status Report For 03/21

Hello from Team *wave* Google!

This week we focused on getting resettled and refocusing our project given the switch to remote capstone. For the most part, our project is mostly intact with some small changes. We did cut out the physical enclosure for our project, given TechSpark closing, but this was not an essential part of project, and we eliminated live testing instead focusing solely on video streams of gestures, hopefully that can be gathered remotely from asking friends.

To facilitate remote capstone, we worked to segment our project into stages that we each could work remotely on. We narrowed down the inputs and output of each stage so that one person would not rely on another. For example,  we determined that the input for OpenPose would be images and that the output would be positional distances from the wrist point to all the respective points as a JSON, something that OpenCV would also output in the future. We also set up Google Assistant SDK so that the text inputs and outputs work and are determined. These inputs and outputs will also be the inputs to our web application This will allow us to do pipeline testing at each stage.

Finally, we decided to also to order another Jetson Nano given we have enough budget, which eliminate another dependency as OpenCV can be tested directly on this new Nano.

More detail on the refocused project is on our document on Canvas.

PS: We also wish our team member Sung a good flight back to Korea where he will be working remotely for the rest of the semester

Jeff’s Status Report For 03/21

This week, I worked with Claire and Sung on refocusing our project so that we could work with little physical interaction. We also fixed our gesture list after noticing from before Spring Break that some gestures that did not show the wrist or showed only the back of the hand were not recognized by OpenPose.

Furthermore, I continued to work on the web application. I set up Docker to run Redis, which will allow the channel layer. This will allow multiple consumers to connect to the web socket and send information, ie the command that our algorithm has recognized as well as the response from Google Assistant SDK.

In addition, I began to get myself familiarized with OpenCV which is to be used in conjunction with our designed glove for a less computational intensive alternative to OpenPose. I began experimented with OpenCV and marker tracking, which is something I will continue next week. The glove currently is simply a latex glove with a marker indicating the key points. I may switch to using a more permanent marker like tape in the future.

Sung’s Status Report for 03/21

So this past week and over spring break, I was focused on normalizing the data that we collected. I normalized the data with the following. With each hand, OpenPose returned a 63 feature list of (x,y,autoscore) components of 21 hand points. With the (x,y) points, I would normalize each of those points relative to a sample hand we designated as our reference hand. With that, I then calculated the relative distance from the reference hand base (the palm) to every other reference hand point. As such, I had 20 reference distances from the base of the hand to other points of the hand. Using that, I made sure every other hand OpenPose recognized was scaled so that the distances of the hands were the same. I used some trigonometry to preserve the angle of the various points in the hands while scaling the distance.

With the new normalized data, I am trying to collect as much data as possible. I looked into some pretrained models I could use to make me train faster, but I am completely not sure how to integrate any pretrained models to work with the specific feature set that we have. As such, I’m still researching more into pretrained models. This is because in order for neural networks to work well, we need a really large training data set, which is particularly hard because OpenPose takes a long time to actually give the output list of 63 features (2 minutes for one image), and there is no guarantee that one image is good enough for OpenPose to actually use the hand tracker.

That being said, this week was a little bit tough for me because I had to move out and I was working on figuring out where I was going to be for the rest of the semester. However, once I move to Korea next week, I expect things to be smoother.

Claire’s Status Report for 03/21

Successfully installed WiFi card for Jetson Nano and tested its speed. The upload speed is around 15 Mbps, which is a little low for what we want to do. This could be a little challenging for the AWS interaction in the future, but we will have to see.

AWS account is set up but instances aren’t started yet so we don’t waste money.

Google Assistant SDK now fully installed on the Jetson Nano as well. Text input and text outputs are both verified to work, but as separate modules. The sample code will need some tweaking in order to integrate both, but this is promising so far. I also found a way to “trigger” the Google Assistant, so we can combine that with the hand-waving motion in order to start the Google Assistant. That might not be completely necessary. Here is the repo for the work being done on that end. 

Next week, I will have text input and output into one module. Right now, it is streaming from terminal but I will also add a functionality that allows it to read from files instead (which is how we are going to have deciphered outputs from the gesture recognition algorithms).

Claire’s Status Report for 03/07

This week has been very unfortunate with all the midterms and projects I had due, so I wasn’t able to work on Capstone meaningfully. Aside from things that I got done last week coming to fruition this past week (e.g. requesting full access to Google Cloud platform within andrew.cmu.edu), I didn’t really start anything new. I will be traveling during spring break, but here are some deliverable tasks that I can achieve remotely:

  • Getting Google Assistant SDK sample to run on my ASUS computer
  • Altering sample code to take in text queries and pipe text outputs (and if not, at least determine how to make them into text)
  • Explore the need for other smart home SDK’s (Alexa?) if Google Assistant SDK is too difficult
  • Re-map some gestures to be more recognizable (working with Sung who will run it through OpenPose)

 

Team Status Report for 03/07

This week, we started going deeper into the machine learning aspect of things. After some experimentation with OpenPose on the Nano, it became abundantly clear that if we want to meet our speed requirements we definitely should not run it locally. It’s good to know this early on – now we know that AWS EC2 is definitely the only way forward if we want to keep our current design of utilizing both OpenPose and OpenCV. 

We also found out that OpenPose doesn’t recognize the back of hands, especially gestures where the fingers are not visible (like a closed fist with the back of the hand facing the camera). We are going to re-map some of our gestures so that we know that each gesture is, at minimum, recognized by OpenPose. This would greatly reduce the risk of a gesture that is never recognized later on, or the need for additional machine learning algorithms incorporated into the existing infrastructure. 

(OpenPose can detect the hand backwards, but cannot do the same with a fist backwards)

We are quickly realizing the limitations of the Nano and seriously reconsidering changing to the Xavier. We are in contact with our advisor about this situation, and he is ready to order an Xavier for us if need be. Within the next two weeks, we can probably make a firmer decision on how to proceed. So far, only the CPU has shown serious limitations (overheating while running basic install commands, running OpenPose, etc.). Once OpenCV is installed and running, we can make a more accurate judgement.

Jeff’s Status Report For 03/07

This week I continued to work on the web application, working again on setting up the channel layer and web socket connections. I also decided to work more on setting up the Jetson Nano to run OpenPose and OpenCV, as finalizing the web application was less important than catching up to the gesture recognition parts of the project.

Getting OpenPose installed on the Jetson Nano was mostly smooth, but had some hiccups on the way from errors in the installation guide that I was able to solve with the help of some other groups that installed on the Xavier. I was also able to install OpenCV which went smoothly. After installing OpenPose, I tried to get video streaming working to test the FPS we would get after finally getting our camera, but I had difficulties getting that setup. Instead, I just experimented with running OpenPose in a similar fashion as Sung had been doing on his laptop. Initial results are not very promising, but I am not sure if OpenPose was making full use of the GPU.

Next week is spring break, so I do not anticipate doing much, but after break I hope to continue to work on the Nano and begin OpenCV + glove part.

Team Status Report For 02/29

Hello from Team *wave* Google!

This week we presented our design report (great job Claire!!) and worked on our design report. After our design report, we received useful feedback regarding confusion matrixes, something that would be useful for us to add onto our existing design. We had already decided to individually classify the accuracy of all our gestures, and by combining that information with a confusion matrix, we can hope to achieve better results.

Another important feedback that we received goes along with one the bigger risks of our project that has put as a bit behind schedule, hardware. This week all our hardware components that we ordered finally arrived, allowing us to fully access the Jetson Nano’s capabilities. We had already determined that OpenPose was unlikely to successfully run the Nano given the performance of other groups on the Xavier, and we have thus choose to minimize the dependencies on the Nano instead running on a p2 EC2 instance. We should be able to know much more confidently next week if OpenCV will have acceptable performance on the Nano, and if not we will strongly consider pivoting to TK or Xavier.

Regarding the other component of the project, the Google Assistant SDK and Web Application, we have made good progress on that figuring out the way to link up to two using simple Web Sockets. We know that we can get the text response from Google Assistant and using a Web Socket connection relay that information to the Web Application. Further experimentation next week, will determine in more detail the scope and capabilities of Google Assistant SDK.

All in all, we are a bit behind schedule which is exacerbated by Spring Break approaching. However, we still have a good amount of slack and with clear tasks next week we hope to make good progress before Spring Break.

Claire’s Status Report for 2/29

This week, I did the Design Review presentation and worked on the report. I also spent a long time exploring the Google Assistant SDK and gRPC basics.

For the Google Assistant SDK, I got to the point where I was almost able to run the sample code on the Nano. I bumped into a lot of unforeseen permissions issues on the Nano, which took a few hours to resolve.

Now, I am stuck at a point where I need to register the device with Google Assistant, but despite a few hours of probing around I cannot get a good answer on why this is happening. It seems like there is, again, a permissions issue. There are not too many online resources for debugging this because it is a little niche and Google’s tutorial for it is quite incomplete.

I have also contacted the school’s IT desk so I can create a project under my school Gmail account rather than my personal one. I want to be able to create an account under the school’s Gmail to make my project “internal” within the CMU organization and skip some authentication steps later on in the process (i.e. having to provide proof of owning a website for terms and agreements) . The IT desk and I are preparing for additional permissions for my account so I can create actions on my Andrew email (CMU emails are normally denied that privilege).

For the gRPC, I was able to run some code based from the samples. I think there is potential for it to be very useful for communicating to either of the AWS servers we have. For the WebApp, it can give it information from the results from the command and to display it on screen.

For the deliverables next week, I will be completing the introduction, system specification, and project management sections of the design report. I will also continue working on the Google Assistant SDK samples on the Nano and try to get the issues resolved as soon as possible. I should also have a new project created on my school email instead by next week. Aside from that, I will be installing the WiFi card on to the Nano.