[Irene] Building the Pet Door

I laser cut all the door parts, with the assistance of my friends, Alfred Chang and Christian Manaog. Two identical rectangles were laser cut into the 2ft x 2ft plywood. These are the two sides of the hollow door. Professor Nace helped me cut 1 inch cube spacers with the band saw and taught me how to use the power drill. We used bolts to maintain alignment between the two plywood sides together and used the cubes to maintain spacing. I installed the pet door and here is the result:

Here is the installation guide for the drawer slides: https://images.homedepot-static.com/catalog/pdfImages/d5/d57bf3c8-71fe-4ed0-af53-a427049d4421.pdf. It will be adapted to fit our needs. A 23cm x 21cm plywood panel has already been cut. #8-32 machine screws will be used to secure the panel to the drawer members. Two 12in x 1in x 1in rectangular prisms will be cut using a band saw. Wood screws will be used to attach the cabinet members to the rectangular prisms, and #8-32 machine screws will be used to attach the prisms to the plywood. The solenoid will be secured using #4-40 machine screws and the servo will be secured using #6-32 machine screws.

Since the panel 21cm high and the servo rotates 180 degrees, the diameter of the wheel needs to be 13cm. The bottom of the wheel needs to be 21cm higher than the top of the panel. There is not enough space above the pet door, so I will need to elevate the servo. I plan to laser cut and band saw a few more parts to create an extension mount. We don’t have time to order more parts and re cut more plywood and rebuild the door. Since this is a prototype, this is okay, and is convenient for storage because we can take off the servo extension mount.

Parts have arrived, and progress is on schedule to complete the door by March 23, and the computer vision script by March 30. I am on track to be ready for integration on April 1.

I will be able to use the power drill for installing the drawer slides and panel, and will only require additional assistance for the band saw. Deliverable for March 23 is the completed door.

[Jing] Finding our path

I spent the majority of the week finalizing our design document. I latex’d a template and shared it with the team on Overleaf, completed my assigned sections of the paper regarding project management, computer vision, and machine learning, and created a revised Gantt chart for our team.  I also went with Irene to pick up plywood and other hardware parts at Home Depot.

For the upcoming week, I will begin implementing my Machine Learning model in Python, and setting up a server on AWS to run the code.

March 2: Team Status Update

This week the team focused on finalizing the design presentation, as well as the design document. We spent a significant amount of time narrowing down our requirements, then finding solutions to accomplish these requirements. We initially struggled with finding a value for the false positive and false negative rates for the cat door opening. We were unable to find statistics on raccoon behavior, or a value for how much damage raccoons can cause. We then realized that our goal simply needs to be better than the current design of a regular cat door. For clear reasons, a regular cat door will always let a racoon in because there is no locking mechanism. Therefore, any value for false positive less than 100% would be an improvement. In addition, we decided to challenge ourselves to achieve a 5% false positive rate, as this rate is achieved by competent facial recognition algorithms. We also chose 5% as our false negative rate because if we assume a cat uses the door four times a day, the user would be alerted once over five days that their cat may be stuck outside, which is reasonable.

More on the project management side, we decided that in addition to our two meetings a week during class time and our Saturday meeting, we should meet most days for a “stand up.” These meetings will be done over Zoom and will allow us to communicate our accomplishments over the past 24 hours and what we wish to accomplish in the next 24 hours. We believe that this will help us work better as a team, as we will be staying in touch on a daily basis. This is especially important as the semester goes on, when we start implementing our designs.

Our team is currently on track!

[Philip] Final Design Decisions

This week my focus has been on finalizing the design decisions for the design presentation and report. I wrote the system description for the iPhone Application, which included a more detailed wireframe. In this report, I specified what the requirements are for the app, and how we would accomplish those.

Note: Cities will be replaced with “Today” “Yesterday” “Past Week” “Past Month” “Past Year”

I also wrote the system description for the System Hub. Last week, I explained how we chose the Jetson TX2 for the CV and ML acceleration. I explained how the developer kit will suit all our needs for the system hub: it will need to communicate over Wifi to a phone, receive camera footage, apply our Computer Vision and ML algorithms, control the servos for the door, turn on and off and LED, and receive PIR data. It will replace the original plan to use a Raspberry Pi for the communication in conjunction with the Jetson GPU. In addition, I discuss our camera choice.

Finally, I research related cat doors for the research paper. I discuss the benefits and downsides of the traditional cat door, as well as an RFID activated cat door.

My progress is on schedule.

In the upcoming week, I will get started on writing code for the iPhone app.

[Irene] Let’s Laser Cut

In preparation for the design review, I did a dry run for Sam. Jing and I went to Home Depot to pick up the plywood and a few other small hardware parts. I wrote the abstract, intro, requirements, architecture overview, and future work sections of the project paper. I ordered the remaining parts.

I responded to the design review feedback in the design document. After consulting a few instructors and people more knowledgeable than us in the machine vision field, I learned the following. In order of preference, our testing options are as follows: live animals, taxidermy, video feeds, printed high resolution pictures, and stuffed animals. My friend has a cat and Jing’s friend has a cat, but animals aren’t allowed on campus. Thus we will record footage of said cats interacting with the system and the system responding appropriately. For raccoons, we won’t be able to find live raccoons to test our system on. A taxidermy raccoon is as close as we could get to a live raccoon, but they are expensive. We have decided to look for videos similar to footage the camera would have actually captured of a raccoon. This is better than printed high resolution photos because the footage animal would replicate reality more closely than a video of a printed picture. Stuffed animals are not a good test of our machine learning algorithm because they don’t represent actual animals. A machine learning model that classifies stuffed cats as real cats would be considered a poorly train classifier.

There is no power metric because the device will be plugged into a wall outlet.

The next large milestone is to have version 1 of the door finished by March 22. I am a little anxious about finding power tools and small metal parts to construct the door. To alleviate that, I am proactively working on obtaining the right hardware from home depot and getting the plywood laser cut first. Whenever there is a delay in the construction of the door due to waiting on parts, I can context switch to writing the computer vision python program for motion detection and tracking.

I will have the door parts laser cut by the end of next week.

[Jing] What is Deep Learning?

This week, I continued to flesh out the details for our machine learning algorithm. After looking at several other convolutional neural network architectures, I discovered that all of them consisted of at least one of each of these layers in a order like this: convolution -> ReLu -> max pooling -> dense -> softmax inference.  Additionally, adding extra layers would often times increase the recognition accuracy, but increase the amount of time it takes to process one image. For our project, the sample space of possible images is small because our camera is stationary and will be focused on an area close to the ground. Therefore, inference will naturally be more accurate and not require a complicated neural network.  To account for this, the neural network that we build will only have the one of each required layer of a convolutional neural network. If the accuracy of our deep learning algorithm does not hold up, then we will add an extra convolution layer and ReLu layer until it is accurate enough.

I spent the rest of my time determining valid success rate goal for our deep learning recognition. Our team had to figure out goals for false positives (raccoon is let in), and false negatives (your cat is locked out), but couldn’t justify our numbers. After much deliberation, our team decided to post polls on Reddit with questions such as “How frequently do unwanted animals invade your home every year?” or “How much money would you pay for a smart cat door which had a recognition rate of 95%?” Unfortunately, we had a very few number of responses, which doesn’t make the poll responses very useful. In the end, we decided that 95% would be a reasonable goal for our project. We will aim for at most 5% of false negatives and 5% of false positives.

Many research papers on animal recognition that we read reached on average a 95%-97% recognition rate (95% of the time, the algorithm correctly recognizes the animal), but were deployed in environments which were volatile. Because our environment is mostly static, our algorithm does not need to deal with edge cases, varying backgrounds, etc. Therefore, a 95% recognition rate is almost certainly achievable and is essentially the baseline that other deep learning algorithms have reached. Achieving something higher, such as 99%, could be done, but would require either algorithms which are much more advanced than we could implement, or adding other methods of detection like sound, heat, or weight.

Another controversial design decision that we made was to classify cats into breeds. Originally, we planned on doing something similar to cat facial detection, but we decided that doing something like that was out of our skill set. It is certainly possible to do cat facial detection – there are data sets online which have labeled features of cats (such as ears, eyes, fur, etc.). However, given the time frame of the project, we decided that the door will not be able to recognize your cat by its face, but will recognize your cat by its breed. The user will add his or her cat to the system and choose its breed from a list of breeds.

Lastly, I worked on finding data sets to train on. I found several data sets including images of dogs, cats, squirrels, raccoons, and human legs. These will be the primary objects that the camera will detect. Adding more classes of objects will be easy, since data sets are largely available online.

[Philip] Choosing Hardware

This week I focused on finalizing what hardware we want to use. To do this, I came up with several requirements for our system. I determined that based off the average speed of a cat, our computer vision and machine learning algorithms will have approximately 1.2 seconds of cat visuals. In addition, I believe that we want to maximize the number of images we can process during this time. For example, based off my research a Raspberry Pi can compute at a rate of 1 frame per second. In most cases, we would take only one image of the cat, which is a great risk because the cat could be looking away or a light glare for just that one image. I also looked into an Odroid, which is essentially a more powerful Raspberry Pi. Even this would yield 2-3 frames per second. Again, we would be banking on receiving a stable image during these frames. Based off this research, our team decided that GPUs was the best course of action.

I focused my GPU research on Nvidia GPUs as I have experience writing parallel code in Cuda on Nvidia GPUs. Nvidia has a family of GPUs called Jetson whose application are for embedded systems. They have 256 cores. In addition, the development kit has a quad-core Arm CPU, Wifi capabilities, and many I/O ports. The Jetson TX2 was not only a solution for our image processing, but also for our system communication. In addition, I added this information along with more details to our design paper.

I also made progress with the app design, starting with a simple wireframe in Xcode:

My progress is on schedule.

In the upcoming week I will be working on finishing up the design presentation, in addition to figuring out more details about the app and the camera to Jetson communication mechanism which I will be reporting in the design paper.

Feb 23: Team Status Update

We want to minimize the latency of our computer vision and ML algorithms because we want to be able to open the door for a valid cat as it is walking up to the door, without having the cat needing to wait. We estimate that the cat will be within range of the camera for a total of 1.2 seconds.
Through our research we determined that a Raspberry Pi would allow us to compute around 1 frame per second, which is too slow because we could potentially only receive one image during the 1.2 second span and this image might not give a good indication of whether the animal is valid or not. Similarly, we looked into Odroid which is a board similar to the Raspberry Pi, but much more powerful. This would likely yield us 2-3 frames per second. Still, we are unsure if this frame rate is fast enough and we want to be sure that we are going to get at least one good image for our algorithms.
We then looked into GPUs, which are processing units designed for image processing. Nvidia makes the most commonly-used and best documented GPUs. In addition, one of our group members has experience with Nvidia GPUs. We found the Jetson family, which are GPUs created for the embedded systems world. Specifically, we chose the Jetson TX2, which has 256 Cuda cores, because based off of our research we will be able to process 15 frames per second. Furthermore, Nvidia has a library called TensorRT, which compliments TensorFlow. This library can be used in conjunction with TensorFlow to optimize the ML algorithm computation for Nvidia GPUs. We will be using this to improve the latency of our algorithm.

To implement motion detection, we first can store a weighted average of previous frames and call this our “background frame.” With the weighted average, the script can dynamically adjust to the background, even as the time of day changes along with the lighting conditions. Then we compare the background frame to the current frame by subtracting. If the delta is above a certain threshold, then we have detected motion as a substantial difference in the image. We know where the motion occurred in the frame, so we can crop that part of the image out. On the other hand, tracking involves comparing adjacent frames to figure out what moved where. So even if there are two moving objects in a frame, we can figure out what moved where.

Instead of using an ultrasonic sensor, a PIR sensor will be mounted to the bottom of the door on the indoor side in order to know when the door needs to open for a cat wanting to exit the house. The camera will be mounted on the top of the outdoor side and angled downwards. The camera will be used in determining when the door needs to open for a cat wanting to enter the house. A door switch will be used for when the servo needs to lock after a cat has finished entering or exiting the house.
Passive infrared sensors detect changes in infrared radiation. All objects with a temperature above absolute zero emit heat energy in the form of radiation, so a PIR sensor can be used to sense movement of people, animals, or other objects.

[Irene] So Many Diagrams

This week, I put together diagrams to organize our thoughts and make sure we’re all on the same page about interconnections. First, I drew the door design. In a moment of panic, I thought that the servo would not be strong enough to lift a wooden panel, so I changed it to a flappy door design:

Then a mechanical engineering friend told me to redo the calculations and I realized that kg-cm means kg force x centimeter distance. So our 10kg-cm servo can lift 2kg at 5cm radius. So we’re back to the lifting door design.

I learned that it is important to have consistent lighting for machine vision. I selected an LED that content creators on Youtube often use for filming. It will be connected to a power relay that is controlled by the jetson. An alternative lighting mechanism would be IR lighting. IR wavelengths of 850nm and 940nm (also called NIR – Near InfraRed) are commonly used in machine vision. IR reduces color of objects, glare, and reflections. IR has a longer wavelength than visible light which usually results in a greater transmission of light into a material through materials like paper, cloth and plastic. IR wavelengths react differently on materials and coatings than visible light, so certain defects and flaw detection can be identified with IR where visible light did not work. One drawback would be that IR lighting changes the color of the cats fur in the image and therefore, our machine learning model would have to be trained on IR images. This dataset is hard to find.

Here are the interconnections between all the parts, along with the software we need to write:

And an events diagram for what happens after what, which will be converted into a flowchart of images:

I some did reading on computer vision, specifically motion detection and tracking. Basically, we can detect motion by taking the average of the past ten frames and comparing it to the current frame. Read the team status update for more details!

Onwards to design review!

Irene