Claire’s Status Report for 04/18

I finished some additional random inputs using mostly our makeshift ASL. I compiled the most popular search queries from the last 12 months based on Google Trends with the keywords who, what, when, where, why, and how. Then, I took out duplicate queries. Now, these queries can be included in the list of randomized normal ones. Here is a gif showing the question “what is a vsco girl” with all the appropriate spaces.

(Nevermind doesn’t seem like gifs work. Try this link.)

Aside from that, I was also able to get the camera working and setup, as per the Wednesday check in.

Next week, I want to make sure that we have all the AWS instances set up and running in conjunction with the Nano.

Claire’s Status Report for 4/11

This week, I got the AWS setup and I think we finalized how we want to have it running in conjunction with the Jetson device. We have scripts ready on both side and we now have a way of SCP-ing images from the device to the server. Then, the server will delete the images as they become processed. The FPS situation didn’t really end up getting resolved – it is still sampling at a much lower rate than I would like (roughly 10 fps at best). However, it can now be adjusted.

Next week, I will be working further on the OpenCV and adding some more commands to the list of testing commands possible. I want to ask some basic who/what/when/where/why + how questions, but I need to think about how to generate them. I want to also be able to adjust the FPS for the testing on the fly while maintaining roughly the same “ratio” of signal and noise. The biggest hurdle right now is really the sampling, which I think would be difficult on a device on the Jetson Nano. I will try some more things and have scripts running with Sung’s OpenPose software by the end of next week for sure.

Claire’ Status Report for 4/4

Hello! This week, I did some tweaking with the testing. To start, I am now doing around 20 fps for the testing video. I am still changing around some parameters, but I think I am going for this particular setup.

I want to switch away from the one gesture straight to next approach I was doing before to simulate more of the live testing that I talked about with Marios. If we go for 20 frames per second (which we might tweak depending on how good the results are, but we can go up to 28 fps if necessary), I want at least 3/4 frames to be the correct gesture consecutively for the gesture I am trying to get the Jetson to read. The five frames before or after can be either blank or another gesture. That way, it should be guaranteed that there will be no consecutive 15 frames of one thing at any point of time no matter what. Obviously, real life testing would have more variables, like the gesture might not be held consecutive. But I think this is a good metric to start.

Here is a clip of a video that has the incorporated “empty space” between gestures. 

From this video you can see the small gaps of blanks pace between each gesture. At this point I haven’t incorporated other error gestures into it yet (and I want to sort of think more about that). I think this is pretty much how we would do live testing in a normal scenario – the user holds up the gesture for around a second, quickly switches to the next one, etc.

Next week, I plan on getting the AWS situation set up. I need some help with learning how to communicate with it from the Jetson. As long as I am able to send and receive things from the AWS, I would consider it a success (OpenPose goes on it but that is within Sung’s realm). I also want to test out the sampling for the camera and see if I can adjust it on the fly (i.e. some command -fps 20 to get it sampling at 20 fps and sending those directly to AWS).

 

Claire’s Status Report for 03/28

This week, I managed to get the basic functionality of the randomly generated testing working. I have a Python program that can read images from a directory and create videos with it through OpenCV. I knew that it was very likely that this functionality is possible in Python, but it was good to see it working.

Over the next two days I want to refine the script in terms of range of possible inputs for the test. For example, I want to see what range of words I will deem “acceptable” as a part of the command for Google. In particular, I think the “what is?” command (not its own designated command, but a feature we plan on fully implementing) would be the hardest to execute correctly with good inputs. For example, we want to eliminate words like “it”, “she”, “her” – special words that hold no meaning in a “what is” context. It would be nice to include some proper nouns too. These are all just for interest and proof of concept (since no one will be actually using our product, we want to show that it is functional as a normal Google Home would be).

Another concern that came up is the spacing. After some thought, I think it would make sense to put some spacing of “white space” between gestures, as a normal person signing would be. Someone who actually signs in real life will probably do one letter then switch to the next one in less than a second. This spacing could be important – this will help us distinguish repeating letters. I didn’t think of this before, but now that I have I think I will put some randomly timed breaks where the video is just blank (need to explore that in the next few days as well) to imitate that. This could great improve our accuracy and simulation of a real-world situation.

Claire’s Status Report for 03/21

Successfully installed WiFi card for Jetson Nano and tested its speed. The upload speed is around 15 Mbps, which is a little low for what we want to do. This could be a little challenging for the AWS interaction in the future, but we will have to see.

AWS account is set up but instances aren’t started yet so we don’t waste money.

Google Assistant SDK now fully installed on the Jetson Nano as well. Text input and text outputs are both verified to work, but as separate modules. The sample code will need some tweaking in order to integrate both, but this is promising so far. I also found a way to “trigger” the Google Assistant, so we can combine that with the hand-waving motion in order to start the Google Assistant. That might not be completely necessary. Here is the repo for the work being done on that end. 

Next week, I will have text input and output into one module. Right now, it is streaming from terminal but I will also add a functionality that allows it to read from files instead (which is how we are going to have deciphered outputs from the gesture recognition algorithms).

Claire’s Status Report for 03/07

This week has been very unfortunate with all the midterms and projects I had due, so I wasn’t able to work on Capstone meaningfully. Aside from things that I got done last week coming to fruition this past week (e.g. requesting full access to Google Cloud platform within andrew.cmu.edu), I didn’t really start anything new. I will be traveling during spring break, but here are some deliverable tasks that I can achieve remotely:

  • Getting Google Assistant SDK sample to run on my ASUS computer
  • Altering sample code to take in text queries and pipe text outputs (and if not, at least determine how to make them into text)
  • Explore the need for other smart home SDK’s (Alexa?) if Google Assistant SDK is too difficult
  • Re-map some gestures to be more recognizable (working with Sung who will run it through OpenPose)

 

Claire’s Status Report for 2/29

This week, I did the Design Review presentation and worked on the report. I also spent a long time exploring the Google Assistant SDK and gRPC basics.

For the Google Assistant SDK, I got to the point where I was almost able to run the sample code on the Nano. I bumped into a lot of unforeseen permissions issues on the Nano, which took a few hours to resolve.

Now, I am stuck at a point where I need to register the device with Google Assistant, but despite a few hours of probing around I cannot get a good answer on why this is happening. It seems like there is, again, a permissions issue. There are not too many online resources for debugging this because it is a little niche and Google’s tutorial for it is quite incomplete.

I have also contacted the school’s IT desk so I can create a project under my school Gmail account rather than my personal one. I want to be able to create an account under the school’s Gmail to make my project “internal” within the CMU organization and skip some authentication steps later on in the process (i.e. having to provide proof of owning a website for terms and agreements) . The IT desk and I are preparing for additional permissions for my account so I can create actions on my Andrew email (CMU emails are normally denied that privilege).

For the gRPC, I was able to run some code based from the samples. I think there is potential for it to be very useful for communicating to either of the AWS servers we have. For the WebApp, it can give it information from the results from the command and to display it on screen.

For the deliverables next week, I will be completing the introduction, system specification, and project management sections of the design report. I will also continue working on the Google Assistant SDK samples on the Nano and try to get the issues resolved as soon as possible. I should also have a new project created on my school email instead by next week. Aside from that, I will be installing the WiFi card on to the Nano.

Claire’s Status Report for 2/22

This week, a lot of my work was focused on developing our design review decks, as I am presenting. I think the most important thing I did was fully flesh out our set of static gesture for the MVP.

We derived these gestures from ASL fingerspelling. We had to make sure that the gestures were unique from each other (the original set had some overlapping gestures) and if unique, distinct enough for the camera. Some examples of similar gestures are K and V. While they look distinct from each other in the image, we felt that they would not be too different from one person to another given differences in finger length and hands.

Aside from the decks, I also worked on getting the Nvidia Jetson running. I successfully booted the disk, but because it lacks WiFi abilities, I wasn’t able to get it to run anything too useful. I started a demo, and tried my hands at some basic machine learning setup to prep the Nano for image recognition. I am now learning how to train networks on my personal computer using its GPU.

This was surprisingly difficult on my machine, due to some secure boot and missing dependencies. After a few hours of installing Python libraries, I got to a point where I was not confident in how to fix the error messages.

Aside from that, because our designated camera hasn’t arrived yet I tried to borrow some webcams from the ECE inventory. Both didn’t work. One was connected through GPIO, and another was through the camera connector. Both were not detected by the Nano despite a few hours of tinkering and searching online. This could be troublesome – especially if the camera connector is broken. However, for now, it is most likely a compatibility issue with the Nano, as neither of the webcams were meant for this particular device. For now, we just have to wait for the camera to see.

The progress is still behind, but I feel fairly confident that it will work out. I can start looking into Google Assistant SDK while waiting for the parts to arrive, as those two tasks do not depend on each other.

As I am looking into other tasks to do while waiting for hardware, I think the best use of my time right now would be to start thinking about the Python scripts for automated testing, and to start testing out the Google Assistant SDK and possibly make my first query.

Thus, my deliverable next week are the design review report and a small Google Assistant SDK Python program. I am thinking something that even just takes in a line of input from the command line and outputs in text form would be a good enough proof of concept.

Claire’s Status Report for 02/15

This week, I worked on getting parts for our Jetson Nano. The most important piece of hardware for meeting our requirements is the camera to go with our board. After some research, I decided to go with a SoC board by e-Con Systems specifically made for the Jetson Nano. I researched and compared factors such as the connector (and thus communication protocol, which affects latency), the size (is it appropriate for an embedded system? does it look intrusive?), the resolution (how much resolution do we need per image for accurate feature extraction?), and finally, the frames per second (how much information do we need to make dynamic gestures?). Unfortunately, the camera won’t be arriving for another two weeks at least, so some parts of the testing may be delayed for now. I hope to continue on with trying out some Jetson demos by borrowing a webcam from the inventory and working with that for now. Luckily, familiarizing myself with the Nano is not a super pressing task – the next task that is dependent on it isn’t due for a few weeks.

Aside from learning camera lingo, I also made a rough draft of our block diagram for the hardware specifications. We have shopped and submitted purchase forms for most of the hardware listed on this image. It took some time for us to find hardware that specifically works with an embedded system and looks sleek. In terms of purchasing parts, we are on time. We started our Gantt chart a little earlier than we should have (we didn’t realize purchase forms didn’t open until this week), but otherwise we are on schedule.

I also worked on collecting some data for each of the hardware and putting them into slides for our design review in a week.

Another factor is the missing microSD card. We just placed the order for it and we can’t set up the Nano without it.

By next week, I hope to have the microSD card and start setting up the Jetson. I will also talk to a TA about possibly borrowing a webcam for now to start setting up some demos on the Nano. I will also be working on the design review slides and presentation next week, and that will be another deliverable.