Mehar’s Status Update for 12/10

Most of my work this week was in trying out further model training configurations to account for some false positives we discovered in our design.

On Sunday,  I worked on adding details of our old approach and restructured approach, our computer vision training accuracy – the complete solution and the final gantt chart for up until the last week. From Monday to Wednesday I was busy with exams and deadlines for some other classes, so I instead put some time into figuring out what tasks we had left as a group and assigning overall priorities to tasks during this time. Work between these few days took about 4-5 hours to compile everything together.

From Thursday onward, we noted that the model was giving some high confidence false positives that the sample thresholding couldn’t account for. Specifically, some occurrences of backpacks were detected as people with >60% confidence or the TV in the room came up as a dining table with >60% confidence as well.

This was when I realized our custom dataset didn’t have enough background information, or samples of objects not to look for in the images – such as TV, or phones etc. Aditi suggested using the Pascal VOC dataset to add to our training, adding in some needed background information. So from Thursday onward, I worked on retraining the model with various combinations of the VOC dataset and our custom dataset. To prep, I wrote a script to eliminate images without tables/chairs/people from the VOC dataset, drop instances of classes we weren’t looking for and to switch the class mapping to our custom class mapping. With the larger dataset I decided to switch over to a larger AWS training instance (P2.xlarge -> P3.2xlarge). I encountered some problems with training as my Allocated CPU limit on AWS didn’t account for the larger instance. Unfortunately, I found this out in the middle of training with my connection closing out and I was unable to restart any instances to save any data. i immediately sent in a new CPU limit increase request to AWS but most of my time during Thursday/Friday was thus spent trying to save all the training data that was stored across the two instances I had (solved using an S3 bucket). In total, I spent about 8-10 hours from cleaning the Pascal VOC dataset to running training on Pascal, to dealing with the AWS server issue.

Another thing we noted with regards to training was how switching class mappings and the overall output layer requires some retraining from the model to remap to the new mapping scheme while also training for higher accuracy in the target classes. Because the classes we are targeting are already part of YOLO’s pretained class mappings (COCO labelling), training using those existing mappings and our dataset will help with the overall accuracy of the model (specifically lowering the rate of false positives as those items now have a specified class). I spent up to 7-10 hours on Saturday reprepping our data to work with the COCO labelling scheme. This involved redownloading our re-labelled custom dataset, running my random data augmentation script to increase the dataset size, writing a new script to switch  to the COCO mappings and running the dataset through the remapping script. On the VOC side as well, I rewrote the VOC remapping script to remap to COCO labellings. From here, I backed up the datasets to an S3 bucket as well.

Remapping scripts for VOC and Custom Dataset to COCO mapping
s3 bucket files
s3 bucket files
s3 bucket files

On Saturday as well, there was an issue trying to use my spot instances to train/move data. Spot capacity works with leftover available CPUs, and no spot capacity was available in the us area regions where my instances were.  I thus also spent time Saturday working with Google Colab to set up a notebook to move our training to. By the night however,  there was spot capacity available so I was able to start training.

I tested 4 configuration of training to start off with,  to test how the model would train in terms of accuracy improvement:

  • Custom for 50 Epochs w/Backbone (Layers 1-10) Frozen
  • Custom for 50 Epochs w/Backbone (Layers 1-10) Frozen + Custom for 10 Epochs w/Layer 1-22 Frozen
  • Pascal VOC for 50 Epochs w/Backbone (Layers 1-10) Frozen
  • Pascal VOC for 50 Epochs w/Backbone (Layers 1-10) Frozen + Custom for 10 Epochs w/Layer 1-22 Frozen

Training is still ongoing, but so far the Custom Training using COCO labelling for 50 epochs with just the backbone frozen seems to have good performance.

Moving into the final demo and onward, I will be training the rest of these configurations and working on compiling my training data for the final report. In addition, working with my team to test remaining metrics and work on the final report/video.

Mehar’s Status Update for Dec 3 (11/12, 11/19, 11/26)

This past week the majority of my time was spent cleaning uo our custom dataset, further training the model and tying together the computer vision pipeline as an independently running module.

The past few weeks before I had tried to retrain the YOLO model using transfer learning with various existing image datasets. Looking between ImageNet and OpenImages and running a few test runs on the existing dataset, i found that existing datasets actually didn’t prove to be extremely helpful in training.

Both datasets have many more classes than those we need to train for. While it is possible to run a simple script to relabel the training data only for our target classes,  the image sets themselves are also too varied to help with training for our specific use cases. As the models are already pertained, training on a custom dataset for our use case is what will help with further increasing accuracy and the confidence level of the system’s detections.

One other thing that came of note was that YOLO was initially pertained with 80 classes instead of our target 3, the change in the output dimensions is also something extra training will need to count for. One consideration was to maybe continue training with the 80 class scheme and to add an extra output layer to only consider the results of the target three classes. However, I also noted that this introduced more overhead in creating the custom dataset – as instances of all 80 classes will need to be labeled for training. So I determined it was best to use transfer learning with only our custom dataset with just the 3 classes.

After collecting the data as a group, I went through and labeled the data for training purposes using roboflow – a cv platform with functionality to label data. Our initial dataset was only around ~40 images, so as per TA suggestion – I looked into Data Augmentation to introduce noise, contrast change (etc) to artificially produce more data. One issue that came up was looking into how the bounding box detection txt files in the training could be augmented as well. I found a Github codebase https://github.com/Paperspace/DataAugmentationForObjectDetection

with functionality to augment the images and the txt detections. Writing a script to augment our data, I used the codebases augmentation features to artificially increase the dataset to 264 images.

Custom Data Augmentation Script to Work With Our Custom Data
Custom Data Augmentation Script to Work With Our Custom Data
Custom Data Augmentation Script to Work With Our Custom Data
Custom Data Augmentation Script to Work With Our Custom Data

For the training itself, I noted that he YOLO architecture has a 10-layer base a s backbone for feature detection and an additional 13-layer head for object detection and classification (in the same step).  So for training, I tested freezing just the backbone  and also leaving the last 1/2/3 layers unfrozen in training. Finding that  training largely stalled at a precision (portion of detections that were true positives) of about 0.5 on most rounds.

Wandb logger results from all runs

This past week specifically, I committed this by removing our extraneous backpack class to try to we had initially put in to account for extra cases of people temporarily leaving a room (a backpack would then be used to indicate that the seat was occupied). The backpacks were easily conflated with some of the chairs and we were choosing to let go of this extra case in rescoping – so I removed the class. One other change in the dataset was limiting the data augmentation to in place changes (removing any changes that messed with scale/shear/tranlating the image data). With that I removed the backpack class and ran the dat augmentation script again to have the 264 training samples.

Training from there, I was able to achieve >0.90 precision with training just the backbone.  This was all that I had worked on until Wednesday specifically, From there, I was mainly writing code to tie the CV module components into standalone module that could run on its own. This took about another 3-4 hours to write out and debug fully.

Mehar’s Status Update for 11/5

This past week most of my work has been in integration and environment setup for training. I caught the flu earlier in the week and wasn’t able to work until about Thursday, so training has been pushed back by about a week now. But I am working on finishing environment setup and training will be done this week.  In terms of integration – I spent time writing code to connect the camera, preprocessing and model together. I spent another 5-7 hours on my own rewriting and tweaking the code to integrate seamlessly with the web app. In this next week, I will need to work more this week to get us back on track for model training.

Team Status Update for 10/29

Overall, as a team this week – we mainly each worked individually on our assigned sections.  As team, we worked on resplitting the computer vision workload to make sure everyone had an equal contribution. Namely,  all members will now work on data collection and labeling, Aditi will additionally work on post processing (sample combination/averaging) and will be in charge of the overall code structure to integrate all portions, Mehar will work on preprocessing and model training. This week, we also went through  the design review feedback and made notes to work on testing the post processing plan we had noted in the paper, and that we will need to expand our testing for any unintended factors such as lighting changes.

Mehar’s Status Update for 10/29

This week, I continued to implement the preprocessing section of the computer vision pipeline. The three main goals for the preprocessing are to denoise and increase the contrast and brightness of the the input images. Initially, I used the builtin denoising methods included in the OpenCV library and a simple scaling function to work on contrast and brightness. I found that simply scaling up the image RGB values for contrast and brightness was very rudimentary. Many areas in the image were at risk of overexposure, and the method leaves little room for the preprocessing to adapt for each image. Researching further,  another contrast/brightness adjustment method I found was the clahe method which involves normalization of the image values. This sort of system is adaptive for each image and I found it was much better for increasing contrast without overexposing portions of the image.

This week, overall – I wasn’t able to work as much due to assignments in other classes. So the goals for next week include finishing up preprocessing and working model training.

 

Denoising + Scaled brightness/contrast increase. The image is brightened sufficiently but already bright areas such as the table are likely close to clipping here. (before = left, after = right)
Clahe method for contrast and brightness normalization (before = left, after = right). (This method requires converting the image over to grayscale, I will be looking into wether it can be done in color next week).

Mehar’s Status Update for 10/22

The week before break, most of my time was spent working on the design report – I researched further into feature descriptor-based object detection and did compiled my benchmark results for the report. In the report, I was responsible for the computer vision trade studies and design implementation sections, the introduction, schedule and team member responsibility sections. This overall was about 9+ hours of work.

Beyond this, during the break I started implementing the preprocessing and overall code structure of our project to integrate the preprocessing/model/postprocessing sections.  This took about 3-4 hours to research and formulate overall.

Mehar’s Status Report for 10/8

This week, my main goals were to look into preprocessing for the model, to   decide the type of data to collect and to start data collection. This changed a bit after Monday once the question around image subtraction as an alternative to deep learning-based object detection came up. I didn’t have much knowledge of non-deep learning methods, so I switched gears a bit to research more into it. I found that what we are looking for is background subtraction and looked into different feature detection methods and classification models we could use in tandem with it.

I went through a textbook for a bit to learn more about non-deep learning based object detection and some comparative studies as well. I mainly found HOG, SIFT, SURF, ORB and BRISK. So far I am finding that ORB and BRISK seem to have a good tradeoff between computational complexity and being able to pick up features in an image. Out of classification models to run on the feature extraction output, Naive Bayes and SVM, along with a few others are popular. As far as data collection, I found a dataset we can use preliminarily to train the model before moving into using footage of the study space itself.

With some midterms and coming up, I didn’t get to work as much to get a model of one of these systems up and running, so I’ll be getting to that next week along with starting data collection of the study space itself.

Mehar’s Status Report for 10/1

This past week I focused on implementing and testing various neural network-based object detection architectures and working on the design documentation. My goals this week were to pull up a fully functional Faster R-CNN model to test with, have the rough CV pipeline laid out and study Open CV further.

The bulk of my time was spent pulling up the models and performing preliminary testing. In my research, I found a promising Object Detection library – Facebook Detectron 2 with support for various Faster R-CNN architectures. Briefly, Mask R-CNN became a consideration since the object masks could help with object occlusion for our use case (ie table covering chairs), but I ultimately decided against it as Faster R-CNN would work sufficiently and the object masks would add significant overhead in labeling training data.

I tested a number of Faster R-CNN architectures on some test images we took after class Monday. Ultimately, I found the larger Faster R-CNN – Reset 101 architectures had higher accuracy and were able to detect more objects. During this testing, I tested out Ultralytics Yolov5. Yolov5 surprisingly performed similarly to the larger Faster R-CNN architectures despite the smaller model size and faster computation time. For this reason I decided on working with the Yolov5 instead of the Faster R-CNN.

Ultralytics Yolov5 on iPhone Image
Detectron’s Faster R-CNN w:ResNet-50 on iPhone Image
Detectron’s Faster R-CNN w:Resnet 101on iPhone Image
Detectron’s Faster R-CNN w:ResNeXt-101-32x8d on iPhone Image

From there, I spent some time determining the rough overall CV pipeline – discussing with Chen how to translate the object detection output into the seat occupation data.  I added this final pipeline to the Design Review powerpoint slides.

Design Review Presentation Computer Vision

I didn’t research as much about Open CV – I fell sick during the week so I lost some time that I might have used to research Open CV. Based on what I was finding with model testing though, I’ve found I mainly need to look into noise reduction, contrast increase and some potential image segmentation for preprocessing.

Next week, I’ll research the Open CV needed for the image preprocessing layers to catch up in that area and will start putting together the image preprocessing with the model. Besides that, next week’s goals include deciding what training data to collect and starting to collect training data using the camera setup.

Mehar’s Status Report for 9/24

My work was focused around pulling together the final presentation components and preparing for the proposal presentation, along with researching and planning the computer vision component of our project

My biggest accomplishment this week was the presentation, I did some reformatting through Sunday night along with rehearsing through Monday and Tuesday. I specifically added in some logos (Scotty dog, and ‘scottyseat’) to our slides and worked some more on our implementation slides

In terms of the overall project itself, my tasks included finding a study room to test in, deciding the final model for the computer vision component and studying OpenCV. As far as the study room, there was no need to look for one as we decided on the ECE lounge study rooms together.

I spent a bit of time going over old OpenCV notes I had and looking through the GitHub depositary for the current object detection model we are looking at, Faster R-CNN. The bulk of my time however was spent trying to rescope our computer vision segment and my portion of the Gantt chart.

With the feedback from the proposal, there were many new specifications to be met so I spent quite a bit of time to figure out what steps I need to take in the next few months to meet MVP.

With the rescoping process, I am a slightly behind on what I planned to study of OpenCV this week but I also now have a better idea of what I can focus on learning for the next week.

In the next week I hope to continue researching OpenCV to have the initial design of the computer vision backend, along with an initial version of the computer vision model up and running. This initial version would be implementing the existing pretrained model.

Introduction and Project Summary

Study spots on campus are often hard to come by, you may search all of Tepper only to find there’s no space. By the time you’ve found a spot, 30 min may have passed that you could have spent working.

ScottySeat seeks to solve this issue by providing realtime updates on available study spots, and their locations.  Building off of past study spot tracking projects, our project will run FPGA-accelerated computer vision algorithms on camera feed data to detect available study spots and communicate this data to the user via a web interface. The web interface will include a map of occupied/unoccupied spots along with overall occupancy of a given study space.

This project overall covers the areas of signals processing, hardware systems and software systems.