[Jing] Final Status Report

This week, I focused on preparing for the final presentation.

I also worked on the circuitry for controlling the door. The solenoid circuit that I originally had was not compatible with the Jetson. It required 4V to activate the solenoid, but the Jetson outputs at most 3.3V. To solve this, I used a 22ohm resistor and another transistor with a 5V source to allow 4V to pass to the gate of the original transistor.

Additionally, I breadboarded an H-bridge to control the motor, but have yet to test it with actual code.

[Jing] Jetson Integration

This week I downloaded the final version of the trained neural network to use for our project. We ended up achieving an accuracy of 82% for the Cat vs. Non-cat inference. I breadboarded the final version of our solenoid circuit using the 12V DC Wall Adapter. The circuit was failing the first time because the transistor was blown, but after I replaced the transistor everything was working well. Finally, I worked with Philip to debug some issues with getting the machine vision code running on the Jetson. It turns out it was mostly just version incompatibility issues between the Jetson and Tensorflow.

I was in Boston for the end of the week (Fri – Sun) and so I wasn’t able to accomplish much else.

Next week, I will work with the team to finish integration and testing and get TensorRT running on the Jetson to improve performance. I will also begin preparing visuals (charts, etc.) for the final presentation and paper.

[Jing] Integration and Refinement

Since we ran out of AWS credits, I turned off the EC2 instance last week. However, this week we received more so I started a new EC2 instance. It turns out that when you start an AWS instance, it keeps the settings of the person who last used it. My AWS instance had a lot of problems because it wasn’t updated to the latest EC2 version, and had libraries which were different versions from the ones that I used, so my old code couldn’t run on the EC2. After reupdating the entire EC2, and uninstalling and reinstalling various versions of the libraries I was using, I was able to get my network to train, but I could not download the model. Instead I had to save the model as a checkpoint file, and then convert the checkpoint file to a saved_model object on my own computer.

I retrained the ML model with improvements to the convolutional neural network and the larger data set. I added several convolution layers and a batch normalization layer, as well as 600 more images (times 2 after flipping them). I ended up getting 84% accuracy on the validation set.

I also helped Philip get libraries set up on the Jetson and we were able to successfully run our computer vision + machine learning code on it.

Unfortunately, I fell sick during the latter half of this week.

[Jing] More AWS credits, and Solenoid

This week I tested the solenoid and ordered a 12V DC Adapter and breadboard to help power the solenoid. I built the same circuit as the solenoid circuit diagram I drew last time.

I also scraped a few hundred more images off of Google to increase the size of our data set for Machine Learning (300 more images of raccoons, and 200 more images of lower body, which took like 3 hours to get). I’ll continue to find more images so that we can have at least 600 or 700 of each. I am also considering taking out the “squirrel” prediction, because the chance that we see squirrels is low and isn’t imperative towards our primary goal, which is to differentiate between cats and raccoons and humans.

Unfortunately, the $100 of AWS credits we had ran out, so I requested more AWS credits (and hopefully will receive them). We initially requested $150 of AWS credits several weeks ago, but one of the codes didn’t work so we only had $100 to work with. I won’t train anymore on the EC2 for now until we get more credits to work with. However, as soon we as have more AWS credits (or in the case that we don’t get any, we’ll use our budget), 

Finally, I tested the Computer Vision + ML inference on the cat that we had at home, as well as on my own self. I’ve recognized a few patterns in the algorithm. If there are any long, leg-like objects of a solid color in the image, it will recognize that as the lower body of a human (for the most part). If there are any gray animals or if there is too much gray color in the image, it detects that as a raccoon, since the raccoon images I have are mostly gray. As of now, the predictions mostly depend on colors in the image, as I stated above. Additionally, the ML inference will return classification probabilities with 97%+ accuracy. In other words, the inference gives us extreme results, whereas having uncertain results (such as a classification probability of 60%) is actually more helpful. After researching online, having a larger data set will help alleviate this (as a larger data set will help alleviate most problems), as well as implementing k-fold (usually 10-fold) cross validation. What cross validation is, is shuffling the data set and retraining on it using random partitions of the data set as the training set and validation set. This is what I plan on doing by the next time I train (which should be by Wednesday April 10).

Next week I will focus on implementing k-fold cross validation, finding more images, and getting the CV and ML code to run using TensorRT.

[Jing] Refining Tensorflow

This past week I retrained the model several times (15+ times) to optimize our neural network and improve the accuracy the inference. I initially hit an accuracy of 65% and tried several things to improve the accuracy.

First, I added another convolution layer, and it improved the accuracy of our model to 75%. However, adding a second convolution layer made our model overfit. In other words, our model hit a 90% training accuracy, but remained at a 75% validation accuracy. Adding a third convolution layer actually lowered the overall accuracy, likely because our data set is not large enough.

Afterwards, I looked into adding regularization functions to improve our validation accuracy. Regularization functions are supposed to improve overfit models by making the model less responsive to noise. These were already built in to the Tensorflow library, so utilizing them was not difficult. I tried three different regularization functions (the sum of absolute values, sum of squared values, and the sum of both), and applied them to various parameters of the neural network – the bias, kernal, and activity variables. After training several times, I discovered that accuracy was significantly worse for some models, and approximately the same for others.

The only other solution I could think of was to enlarge our data set. From what I’ve looked up online, accuracy performance peaks using a data set of 1000-2000 images per class. Our data set currently only has 500 images each for raccoons, squirrels, and the lower body of humans. Over the next week I’ll scrape some images off of Google to enlarge our data set. Enlarging the data set also means that we can increase the number of parameters our neural network can program. This means we can expand our neural network to work with more layers, which may improve the accuracy.

Before I do that next week, I attempted to make even the number of images per class to see if that would change the results. Because I have over 12000 images each of dogs and cats, I removed 11500 of them from each set, so that each class could have 500 images. After training again, I discovered that the results were approximately the same (75% validation accuracy).

Finally, I played around with the batch size of our neural network. The batch size is a parameter which tells the neural network how many images to process at one step. Initially, this value was set to 32, but the smaller I made it, the more accurate our inference was. When I tuned the batch size down to a size of 8, I was able to get a validation accuracy of 78%. This was the best I could get it up to.

[Jing] Running Tensorflow on AWS

This week I began running my machine learning code on AWS. I first requested an EC2 instance with a GPU, which took about 2 days for AWS to process the request. Then on Wednesday I uploaded my data set and code to the EC2. Fortunately, the EC2 instance I requested was built for using Deep Learning libraries such as Tensorflow, so running the code on the GPU was a piece of cake. I simply had to change some settings on the EC2, and the code ran itself on the GPU (or so that’s what the terminal output said).

My first iteration of training resulted in a training accuracy of 76% and a validation accuracy of 74%, and took around 6 hours to run. This was surprisingly good. Although it doesn’t meet our goal of 95%, it seems reliable enough for demoing purposes. In order to bump up the accuracy, I added another convolution and activation (relu) layer. After training for a second time, I achieved a training accuracy of 95%, and a validation accuracy of 76%. What this means is that the model is overfit, and needs a larger data set and a regularization function. I will make the data set larger by flipping all of the images over the y-axis, and add a regularization function, and retrain tonight.

I will also draw a diagram for the solenoid sometime today. Once I finish the diagram, and once I retrain, I will update this post with the results.

[Jing] Setting up Tensorflow

This week I began learning how to use Tensorflow to build a Convolutional Neural Network and train a machine learning model. I did a tutorial online and found that it was surprisingly easy and intuitive.

Originally, the convolutional neural network was supposed to classify between 6 different types of objects: Cat, Dog, Squirrel, Raccoon, Legs, and Shoes. After taking sample images using a camera that was set up at the cat door height from one meter away, I realized that almost all of the time, the camera would see a human’s entire lower body, from the hip to the feet. Therefore, instead of classifying Legs and Shoes, we will get rid of both of them and classify Lower Body. I had previously found a large image data set of humans which contained annotated human body parts, so I wrote a python script to parse through some 10 gigabytes of images and only keep the ones which contained a human’s lower body. Then, I properly arranged each set of images into a directory for the Tensforflow script to read.

Now that the Tensorflow script has been written, the convolutional neural network built, and the data set prepared, I will test on a small sub-data set to ensure that the script is correct and then deploy the entire thing to AWS. I will slowly get to these two tasks over Spring Break and have them done before school starts again.

[Jing] Finding our path

I spent the majority of the week finalizing our design document. I latex’d a template and shared it with the team on Overleaf, completed my assigned sections of the paper regarding project management, computer vision, and machine learning, and created a revised Gantt chart for our team.  I also went with Irene to pick up plywood and other hardware parts at Home Depot.

For the upcoming week, I will begin implementing my Machine Learning model in Python, and setting up a server on AWS to run the code.

[Jing] What is Deep Learning?

This week, I continued to flesh out the details for our machine learning algorithm. After looking at several other convolutional neural network architectures, I discovered that all of them consisted of at least one of each of these layers in a order like this: convolution -> ReLu -> max pooling -> dense -> softmax inference.  Additionally, adding extra layers would often times increase the recognition accuracy, but increase the amount of time it takes to process one image. For our project, the sample space of possible images is small because our camera is stationary and will be focused on an area close to the ground. Therefore, inference will naturally be more accurate and not require a complicated neural network.  To account for this, the neural network that we build will only have the one of each required layer of a convolutional neural network. If the accuracy of our deep learning algorithm does not hold up, then we will add an extra convolution layer and ReLu layer until it is accurate enough.

I spent the rest of my time determining valid success rate goal for our deep learning recognition. Our team had to figure out goals for false positives (raccoon is let in), and false negatives (your cat is locked out), but couldn’t justify our numbers. After much deliberation, our team decided to post polls on Reddit with questions such as “How frequently do unwanted animals invade your home every year?” or “How much money would you pay for a smart cat door which had a recognition rate of 95%?” Unfortunately, we had a very few number of responses, which doesn’t make the poll responses very useful. In the end, we decided that 95% would be a reasonable goal for our project. We will aim for at most 5% of false negatives and 5% of false positives.

Many research papers on animal recognition that we read reached on average a 95%-97% recognition rate (95% of the time, the algorithm correctly recognizes the animal), but were deployed in environments which were volatile. Because our environment is mostly static, our algorithm does not need to deal with edge cases, varying backgrounds, etc. Therefore, a 95% recognition rate is almost certainly achievable and is essentially the baseline that other deep learning algorithms have reached. Achieving something higher, such as 99%, could be done, but would require either algorithms which are much more advanced than we could implement, or adding other methods of detection like sound, heat, or weight.

Another controversial design decision that we made was to classify cats into breeds. Originally, we planned on doing something similar to cat facial detection, but we decided that doing something like that was out of our skill set. It is certainly possible to do cat facial detection – there are data sets online which have labeled features of cats (such as ears, eyes, fur, etc.). However, given the time frame of the project, we decided that the door will not be able to recognize your cat by its face, but will recognize your cat by its breed. The user will add his or her cat to the system and choose its breed from a list of breeds.

Lastly, I worked on finding data sets to train on. I found several data sets including images of dogs, cats, squirrels, raccoons, and human legs. These will be the primary objects that the camera will detect. Adding more classes of objects will be easy, since data sets are largely available online.