Team Status Report For 02/29

Hello from Team *wave* Google!

This week we presented our design report (great job Claire!!) and worked on our design report. After our design report, we received useful feedback regarding confusion matrixes, something that would be useful for us to add onto our existing design. We had already decided to individually classify the accuracy of all our gestures, and by combining that information with a confusion matrix, we can hope to achieve better results.

Another important feedback that we received goes along with one the bigger risks of our project that has put as a bit behind schedule, hardware. This week all our hardware components that we ordered finally arrived, allowing us to fully access the Jetson Nano’s capabilities. We had already determined that OpenPose was unlikely to successfully run the Nano given the performance of other groups on the Xavier, and we have thus choose to minimize the dependencies on the Nano instead running on a p2 EC2 instance. We should be able to know much more confidently next week if OpenCV will have acceptable performance on the Nano, and if not we will strongly consider pivoting to TK or Xavier.

Regarding the other component of the project, the Google Assistant SDK and Web Application, we have made good progress on that figuring out the way to link up to two using simple Web Sockets. We know that we can get the text response from Google Assistant and using a Web Socket connection relay that information to the Web Application. Further experimentation next week, will determine in more detail the scope and capabilities of Google Assistant SDK.

All in all, we are a bit behind schedule which is exacerbated by Spring Break approaching. However, we still have a good amount of slack and with clear tasks next week we hope to make good progress before Spring Break.

Claire’s Status Report for 2/29

This week, I did the Design Review presentation and worked on the report. I also spent a long time exploring the Google Assistant SDK and gRPC basics.

For the Google Assistant SDK, I got to the point where I was almost able to run the sample code on the Nano. I bumped into a lot of unforeseen permissions issues on the Nano, which took a few hours to resolve.

Now, I am stuck at a point where I need to register the device with Google Assistant, but despite a few hours of probing around I cannot get a good answer on why this is happening. It seems like there is, again, a permissions issue. There are not too many online resources for debugging this because it is a little niche and Google’s tutorial for it is quite incomplete.

I have also contacted the school’s IT desk so I can create a project under my school Gmail account rather than my personal one. I want to be able to create an account under the school’s Gmail to make my project “internal” within the CMU organization and skip some authentication steps later on in the process (i.e. having to provide proof of owning a website for terms and agreements) . The IT desk and I are preparing for additional permissions for my account so I can create actions on my Andrew email (CMU emails are normally denied that privilege).

For the gRPC, I was able to run some code based from the samples. I think there is potential for it to be very useful for communicating to either of the AWS servers we have. For the WebApp, it can give it information from the results from the command and to display it on screen.

For the deliverables next week, I will be completing the introduction, system specification, and project management sections of the design report. I will also continue working on the Google Assistant SDK samples on the Nano and try to get the issues resolved as soon as possible. I should also have a new project created on my school email instead by next week. Aside from that, I will be installing the WiFi card on to the Nano.

Sung’s Status Report for 2/29

I was not able to do anything for Capstone this week. I was hit with 3 assignments from 15440 and even though I spread out all the work to be done by Friday, the project that was due on Friday was too much and I am currently using my late day and about to use my 2nd late day to finish this project. As soon as I finish this I will transition to Capstone and do the things I was supposed to this week.

Jeff’s Status Report For 02/29

This week, I worked on the Design Report. I also began making more progress on the web application finalizing some design choices and creating a rough prototype.

The key design and overall of the web application is to emulate the Google Assistant on the phone, which displays visual data of the Google queries in a chat type format.  The “messages” would be the responses from the Jetson containing the information. We are still experimenting with Google Assistant SDK to determine what exact information that is received, but at minimum the verbal content usually stated.

In addition, do to the nature of this application it is important that the “messages” from the Jetson with the appropriate information be updated in real time, ie eliminating the need to constantly refresh the page for new messages to occur. To do this, I decided on using Django channels, which allow asynchronous code and handling HTTP as well as Web Sockets. By creating a channel layer, consumer instances can send then information. The basic overall structure has been written, and I am currently now in the process of experimenting with finishing up the channel layer and experimenting with using a simple python scripts to send “messages” to our web application.

Sung’s Status Report

This week, I was working on the design review in preparation for the design presentation. As such, a lot of time was devoted into thinking about the design decisions and whether or not these decisions were the best way to approach our problem/project.

I was hesitant about using OpenCV and whether or not it could be accurate, and we recognized as a risk factor, and as such added a backup. As such, Jeff and I decided that we should use OpenPose as a backup, and have it running as well as OpenCV. We realized that OpenPose takes up a lot of GPU power and would not work well on the Nano given that the Jetson Xavier (which has about 8 times the GPU capabilities) resulted in 17 fps with OpenPose video capture. As such, we decided to use AWS to run OpenPose, and I am in the process of setting that up. We have received AWS credit and we just need to see if AWS can match our timing requirements and GPU requirements.

Our initial idea revolves around a glove that we use that tracks joints. We were originally thinking of a latex glove where we would mark the joint locations with a marker, but we thought that the glove would then interfere with OpenPose tracking. We tested this out and we found out that OpenPose is not hindered even with the existence of the glove as shown on the picture below.

This week, I have to make a glove joint tracker with OpenCV. I’ve installed OpenCV and have been messing around it, but now I will have to implement a tracker that will give me a list of joint locations. This will probably be a really challenging part of the project, so stay tuned in for next weeks update!

Team Status Report for 02/22

Hello from Team *wave* Google!

This week, we worked a lot on finalizing our idea/concepts in preparation for the design review. In terms of the gesture recognition software we were planning on using, we decided to use both OpenPose and OpenCV to mitigate the risk of misclassifying and/or not classifying at all. Initially, we were planning on only using OpenCV and have users wear a glove to track joints. However, we weren’t sure how reliable this would be so to mitigate the risk of misclassifying/not classifying, we added a backup, which is to run OpenPose to get joint locations, and to use that data to classify gestures. With this new approach, we will have OpenCV running on the nano and OpenPose running on AWS. Gestures in the form of video will be split into different frames, and those frames will be tracked with glove tracking on the Nano and OpenPose tracking on AWS. If we don’t get a classification on the Nano and OpenCV, we will use the result from AWS and OpenPose to classify our gestures.

We wanted to see how fast OpenPose could run as how fast we can classify gestures is a requirement we have, and on a MacBook Pro, we achieved a 0.5 fps with video and tracking one image took around 25-30 seconds. Now OpenPose on the MacBook Pro was running on a CPU, whereas it would run on a GPU on the Nvidia Jetson Nano. Even still, the fact that it ran 25-30 seconds on a CPU to track one image meant that it would be possible that OpenPose would not deliver our timing requirements. As such, we decided to use AWS to run OpenPose instead. This should mitigate the risk of classification being too slow by using OpenPose.

Another challenge we ran into was processing dynamic gestures. Processing dynamic gestures would mean that we would have to do video recognition to do our gesture recognition. We researched online and found that most video recognition algorithms rely on 3D-CNN’s to train/test because of the high accuracy that 3D-CNN provides compared to 2D-CNN. However, given that we need fast response/classification times, we decided not to do dynamic gestures as we thought they would be hard to implement with the time constraints that we are working with. Instead we decided to have a set of static gestures and only do recognition on those static gestures.

We’ve also modified our Gantt Chart to update the change in design choices, especially with the gesture recognition aspect of our project.

Next week, we are going to run OpenPose on AWS and start the feature extraction with the results we get from OpenPose tracking so that we can start training our model soon.

 

Claire’s Status Report for 2/22

This week, a lot of my work was focused on developing our design review decks, as I am presenting. I think the most important thing I did was fully flesh out our set of static gesture for the MVP.

We derived these gestures from ASL fingerspelling. We had to make sure that the gestures were unique from each other (the original set had some overlapping gestures) and if unique, distinct enough for the camera. Some examples of similar gestures are K and V. While they look distinct from each other in the image, we felt that they would not be too different from one person to another given differences in finger length and hands.

Aside from the decks, I also worked on getting the Nvidia Jetson running. I successfully booted the disk, but because it lacks WiFi abilities, I wasn’t able to get it to run anything too useful. I started a demo, and tried my hands at some basic machine learning setup to prep the Nano for image recognition. I am now learning how to train networks on my personal computer using its GPU.

This was surprisingly difficult on my machine, due to some secure boot and missing dependencies. After a few hours of installing Python libraries, I got to a point where I was not confident in how to fix the error messages.

Aside from that, because our designated camera hasn’t arrived yet I tried to borrow some webcams from the ECE inventory. Both didn’t work. One was connected through GPIO, and another was through the camera connector. Both were not detected by the Nano despite a few hours of tinkering and searching online. This could be troublesome – especially if the camera connector is broken. However, for now, it is most likely a compatibility issue with the Nano, as neither of the webcams were meant for this particular device. For now, we just have to wait for the camera to see.

The progress is still behind, but I feel fairly confident that it will work out. I can start looking into Google Assistant SDK while waiting for the parts to arrive, as those two tasks do not depend on each other.

As I am looking into other tasks to do while waiting for hardware, I think the best use of my time right now would be to start thinking about the Python scripts for automated testing, and to start testing out the Google Assistant SDK and possibly make my first query.

Thus, my deliverable next week are the design review report and a small Google Assistant SDK Python program. I am thinking something that even just takes in a line of input from the command line and outputs in text form would be a good enough proof of concept.

Jeff’s Status Report of 02/22

This week in preparation for our Design Review Presentation, I worked with Sung and Claire on preparing our slides and finalizing design decisions.

As a part of that, I worked with Claire on choosing the static gestures that we would use for our MVP. Following the advice from our Project Proposal presentation, we choose to modify fingerspelling gestures as many gestures we noticed were very similar or identical. We then remapped the similar gestures into something more distinguishable, and also the dynamic gestures (ie Z) to unique .static gestures. In addition, we also made the unique gestures for popular commands (ie whats the weather).

Furthermore, I worked with Sung on helping to finalize the machine learning implementation we would use. Based on some research on other group’s performances of running OpenPose on Xavier resulting in 17 fps, we found it unlikely that it would run on the Nano, making us pivot to using AWS combined with running OpenCV on Nano. In addition, we prototyped a glove we would use for our OpenCV implementation. The glove was simply a latex glove with a marker marking the joints (ie knuckles). We then did some testing with OpenPose to ensure that the glove would not hinder OpenPose, as well as the effect of distance on OpenPose as well as other factors (ie is face required). We found that the glove did not interfere with OpenPose, that the face is needed for OpenPose to work, and that at roughly around 3 meters OpenPose has difficulty detecting hands (using a 1.2 megapixel camera on a Macbook).  We also setup OpenCV and began basic experimenting.

Finally, I continued to make some progress on the Web App, but given the Design Review, I choose to focus more on finalizing design decisions this week, resulting in less progress in the Web App, but with the slack allocated it should be okay.

Jeff’s Status Report For 02/15

This week, I worked on designing and planning our approach to testing and validation for our design review, something that we told we had to add more detail in. We decided on an automated approach where we word play video of various different gestures at different intervals, distances, and lighting conditions. This would test the rate of how many gesture we could correctly identify (our requirement was at least 1 gesture per second), as well as performance at different distances and lighting conditions. The video would be synced to the Jetson, so the final output and final time, could be checked with the gesture given, allowing us to also keep track of accuracy and time of individual gestures.

In addition, I was also researching different Web Frameworks and the best way to implement our Web Application. After doing some research, I decided to go with Django due to a lot of useful features like built in authentication, integration with SQLite (database included with Python), as well as being in Python, a language we are all very familiar with. After choosing Django, I began learning and setting up a simple Web Application.

Finally, I set up the WordPress and researched the display part that we would need to not only work on the Jetson Nano with but also to be used in our final project do display visual information back to the user.