Krish’s Status Update for 5/12

Last week, I finished making the model using my custom dataset. I came across a major problem when it came to working the YOLO model, which is what we had previously intended. Initially, I had thought that an object detection model like YOLO was ideal, since it is trained to pick up different objects in a scene. However, one shortcoming I did not foresee was that the images we would collect in our system were not like the natural images, because they were taken from an aerial view. The following two pictures display this disparity.

Image used to train YOLO
Image captured for Smart Library

 

 

 

 

 

 

In the first image, we see the picture is taken from the side. The second image is taken from the top. For human beings, it is easier to identify the presence or absence of other humans, but for a machine learning model, this cannot be abstracted.

In order to fix this problem, I decided to use my own machine learning model, without any pretrained weights. The advantage of this is that it will solely focus on data that we have fed it, and so it doesn’t need to depend on natural images. However, the disadvantage is that I need to make a simpler model, since there is less data availability. For this reason I had to adapt the model, where instead of the model finding the location of the seats, I specify the location of the seats. Then, the algorithm would crop out each seat, resize the pictures and identify whether there was a person in each seat or not. I thought this was a fair compromise, given that the seats in Pablo’s dining room are in a relatively fixed position, and so is the camera that we set up. If we were to take this project further, there would be an extra cost associated with installing the system in a new location, but this cost would be negligible compared to the effort it would take to mount the camera and central nodes at the location.

Additionally, I worked on the frontend of the website. When we had planned on setting up in Sorrell’s Library, I did not know the layout of the seats, especially since we were not sure where we could have mounted the cameras. Once I got a few pictures of Pablo’s dining room, I could understand the layout and set up the website to mirror that layout. Right now, the website can read four occupation bits and display an appropriate html page with red and green colors based on availability, at different locations representing locations at the dining table.

Krish’s Status Update for 21/11

This week, I worked a bit more on the machine learning model. At the review, Professor Yu had suggested I make a small dataset from a few images of my desk. While this is not ideal, it does help me start some of the work. I have faced some issues with the model which I plan on debugging on Monday and after Thanksgiving break. I have two main issues right now, namely overfitting and sensitivity to hyperparameters.

Overfitting. With a small dataset, it is easy to make a machine learning model that memorizes the specific output expected on the training data. This does not identify patterns and therefore doesn’t generalize well. Due to the delays in acquiring data, I haven’t been able to collect a dataset of the right size, so the model is not as strong as I would like it to be.

Sensitivity to hyperparameters. Despite having only a small dataset, there are issues even with learning the training data. Small changes in the training hyperparameters cause large changes to the accuracy of inference on the training dataset. This will get worse as the dataset increases in size. This is a more fundamental issue that I need to research more before I attempt to fix it. My guess is that there is an issue with running darknet on a Jupyter notebook. (Darknet is the platform on which YOLO is written). One fix that I want to explore is changing the version of YOLO to one that was written in PyTorch (i.e. Python). This might fix the issue.

I also wrote a short python script that solves the problem I described in the last status update. One issue that Pablo is facing is that some of the images lose all their data and the bottom part of the image becomes a set of vertical lines. I described the approach in the last status update, but I had not implemented it then. Since then, there have been no problems with implementing my approach.

Krish’s Status Update for 14/11

There is still not much that I could have done this week, before we get the data from Sorell’s. However, I did manage to find some useful resources. Specifically, I found a website called Roboflow, which will allow me to take my labelled training data and run some preprocessing on them. This is different from the preprocessing that we plan on running in the central node, as it specifically pertains to the machine learning model.

The main advantage that Roboflow offers is that it will help me convert images from XML to the darknet format in bulk. For the initial picture of my workspace that I used to test the pipeline, I did this manually. Now that I have found Roboflow, I can do this automatically, saving a lot of time to process thousands of images.

Another advantage that Roboflow offers is increasing the size of my dataset. It lets me perform transformations like rotation, scaling and blurring on duplicates of the images in the dataset. With combinations of these transformations I could increase the size of my dataset by a factor of 3-10. One consideration I will need to make is that this compromises on the quality of the dataset, since the images will have some similarity among themselves.

On a different note, one issue that Pablo brought to my attention is where some of the images taken are distorted. The bottom part of the image is cut off and replaced with vertical lines, as shown in this picture. Pablo mentioned this was due to a wiring issue, but I am also planning on solving this problem in software.

Bad Quality Image

One thing to note is that the lines that cause the distortion are all perfectly vertical and they are always at the bottom of the picture. This can be detected using a vertical Sobel filter. The Sobel filter is a high pass filter for images in one dimension. Since there is no change vertically in the bad part of the image, there is only a DC bias. A high pass filter will remove this bias and leave the bottom half of the image to be all zeros. After that, we simply need to compare the last bottom lines of the image to zeros in order to detect this kind of error.

Krish’s Status Update for 7/11

I was not able to do much work this week on the project, due to my other commitments. Next week, we should have some data available, so that I can start training the machine learning model.

Krish’s Status Update for 31/10

There was not much work for me to do this week, since the machine learning data is not yet available. I spent most of my time on getting familiar with AWS and reflecting on the ethics readings.

For AWS, I read up on the S3 product. S3 stands for Simple Storage Service, and we may use it to securely maintain and access our data on the cloud. It has multiple tiers like Standard, Infrequent Access and Glacier for varying levels of access amounts and latency requirements. After reading up on all of them, I will use the Standard S3 bucket. It has a durability of 99.999999999% and availability of 99.99%. The high durability ensures that the data will not be lost and the high availability will ensure good enough latency for the purpose of training a machine learning model. Other than S3, I also plan to use EC2 for this project, but I had researched EC2 before this week and my research into it this week did not reveal any new information that changed my plans.

I also spent a good amount of time this week on the ethics readings. I found Langdon Winner’s paper particularly interesting. I didn’t know the extent to which simple design choices made by engineers affected society and culture. It has made me more mindful of my project and gotten me to think about unintended consequences that may arise. While working with data containing images of real people, I must be very careful that the data is secure and used only for the purposes of this project. Otherwise, it could be misused and violate people’s privacy.

Krish’s Status Update for 24/10

This week, I tested the pipeline for the machine learning model using some pictures taken from my phone. Until the real data is available, the model cannot be trained. However, in the meanwhile we can ensure that when the data is available, the code is able to run smoothly.

When I had last created the pipeline, I did not have any data to run the code on. However, this week when I used the images I took from my phone, it revealed some bugs. None of the bugs were individually worth mentioning, but I spent a large amount of time debugging. Once the dataset from the cameras is available, we should be able to run them through the model and start with transfer learning.

I also read up on web sockets. Last week, I spoke to Arjun about how the server would communicate with the central node. We had two options, post requests and web sockets. Post requests come with problems regarding the CSRF tokens. On the other hand, I personally did not have a lot of knowledge about sockets. This week, after reading up on sockets and Django Channels, I can implement web sockets for communication with the central node. This will be faster than a post request, since the socket is constantly connected, and so data can be sent over faster. We still have the option of implementing Post requests, but as of now I will experiment with making a web socket connection.

Krish’s Status Update for 17/10

According to the schedule, there is not much to be done for the machine learning aspect of the project until we are able to collect data. In the meanwhile, I worked on the website interface of the project. I set up the Django server which communicates both with the central node and the users. As of now, there is no information to transfer, so I used dummy information.

I spent a majority of my time this week researching the CSRF token. CSRF stands for cross site request forgery. It occurs when a malicious website sends an HTTP Post request from an unknowing user’s computer to another website. Since it is a post request, this can change the state in the other website’s server. This could have consequences ranging from posting information on social media on the user’s behalf without the user’s knowledge to something potentially more dangerous. As a result, it is common practice to send the user a cross site request forgery token. Only the user’s browser has access to this token and it is required to make post requests. As a result, any malicious website cannot post on behalf of the user.

In our project, the only entity to make a post request to the server is the central node, which will not interact with other websites. Thus, it may not require a CSRF token. Additionally, if we require a CSRF token, we would need a two-way communication channel between the central node and the server, since the server will have to send the token. Without the token, we could simplify our design so that the only communications are from the central node to the server. At the time of writing this, I am still in the process of researching. In the next few days, we should have made a decision about the CSRF token.

Next week, I will continue to work on the website. Additionally, if we are able to generate data, I can also start working on the machine learning model.

Krish’s Status Update for 10/10

In the proposal presentation we had mentioned that all the code would be written on AWS. However, I realised that Google Colab is better for development. The reason for this is that Colab gives us free access to a GPU. This allows us to spend a potentially unbounded amount of time developing the model without constraining on our budget. When we are ready to deploy the code, we can then export it to AWS, because that is more robust.

This week, I set up the pipeline to train the ML model for the project. Usually this is done when the dataset is available, so that the data can be preprocessed and tested. Since in this case the data is not yet available, I built the pipeline to the best of my ability without it.

I also spent a good amount of time researching tools for when the data is available. One of the tools required is an annotation tool. This will allow me to draw bounding boxes over the data images and set labels using a GUI. Since I there are 10k images we hope to get, an annotation tool can significantly speed up the labelling process which could be a bottleneck going forward. In my research, I found LabelImg (https://github.com/tzutalin/labelImg) which seems to be the best annotation software because it is compatible with the PASCAL VOC format that is required by YOLO.

With reference to the machine learning, we are on schedule. I can start working on the next big steps once I have access to the data.

References:
Pipeline:
https://medium.com/oracledevs/final-layers-and-loss-functions-of-single-stage-detectors-part-1-4abbfa9aa71c
https://www.curiousily.com/posts/object-detection-on-custom-dataset-with-yolo-v5-using-pytorch-and-python/

Annotations:
https://github.com/tzutalin/labelImg
https://github.com/ujsyehao/COCO-annotations-darknet-format
https://github.com/wkentaro/labelme