Ethan’s Status Report for 3/1

The majority of effort this week was spent on finding bugs in our YOLO codebase that caused the results from last week to look really poor. I discovered it was the loss function. Previously I was using a more naive approach that combined a weighted mean-square loss for the bounding boxes with a weighted cross-entropy loss for classification. After reading a couple of Medium articles about YOLO, I realized that I was implementing an entirely different loss function. Once I fixed that I also a little bit more training infrastructure that would hopefully making training analysis easier: curve plotting for each loss. By monitoring loss, I can identify parts of the model that would need tuning in future runs.

Next week, I plan on getting detection working on toy images.

Currently I am on schedule.

Ethan’s Status Report for 2/22

This week, I was able to finish the training infrastructure to train the YOLOv8 OBB model. Now that I am able to train the model, I need to employ some sort of verification strategy determine that the model was implemented correctly before I do full batch training. I decided on training the model one single basic image (I am defining basic as an image where the object is close up and on top of a distinct background). After training on this image for a significant number of epochs, I found that the detected bounding box was completely off. Currently, I believe that something went wrong with the model’s OBB detection head and spent a majority time this week trying to verify this assumption.

Next week, I plan on getting detection working on this toy image and hopefully training using the entire dataset and analyzing the results from there.

Currently, I am on schedule.

Ethan’s Status Report for 2/15

This week, I was able to finish the initial implementation of the YOLOv8 OBB model. Unfortunately, I found that the dataset I found on Roboflow last week is no in the format that YOLO models expect. Inside of having a (x, y, w, h, theta, class label) ground truth for each object in an image, the dataset actually has the following ground truth (bbox coordinate 1, bbox coordinate 2, bbox coordinate 3, bbox coordinate 4, class label). In order to finish this, I need to re-annotate the dataset. I plan on finishing a script by the end of tonight to fix this problem.

Currently, I am a little behind schedule. To remedy this, I plan on continuing to work on implementing the training infrastructure tomorrow and Monday. My goal is to start training the model by Tuesday. This way I will have sufficient time to (i) debug my model implementation and (ii) write data augmentations to artificially increase the amount of data.

Next week, I plan to have the model trained for a reasonable number of epochs to determine what optimizations I need to do on it for the best performance on the training dataset.

Team Status Report for 2/8

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

The most significant risk we could face right is that our expectations of our software and hardware do not match reality. In order to mitigate this we have employed a lot of unit testing to verify our assumptions. For example for the machine learning pipeline, we are trying to pay close attention to the dataset we are training on (we are watching out for class imbalance, lighting, resolution, and etc) to ensure that what we train on will be indicative of reality. The current contingency plan for this is to keep looking for data and potential aggregate multiple datasets together. Another risk is the fact that the end effector we choose may not be compatible with a considerable amount of the objects we intend on working with. In that case, we intend on possibly purchasing another end effector such as a gripper with the remaining funds, in addition to the suction type end effector.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

No major changes have been made yet, however, we still need to decide on the final specs for the camera (720p vs 1080p).

Provide an updated schedule if changes have occurred. 

No schedule changes have been made.

Ethan’s Status Report for 2/8

The majority of time this week was spent on two efforts: verifying that the ±5 pixel expectation of the machine learning model was both not too strict and not to lenient and determining if the Jetson Orin Nano has enough compute for our needs. While evaluating the ±5 pixel expectation, I searched for trash datasets on both Roboflow and Kaggle and eventually settled one on Roboflow that I really liked. After visualizing images from the dataset with their oriented bounding boxes, their centroid, and 5 pixel circle around their centroid, I see that 5 pixels is a robust expectation to have.  Regarding the compute of the Jetson Orin Nano, the specifications say that it has 1.28 GFLOPs and  medium-sized YOLOv8-OBB model needs 208.6 FLOPs.  Even with a FLOP efficiency of 20%, the Jetson Orin Nano should have more than compute to run the model and potential any other assistive processes that strength the centroid calculation process.

Next week, for the first part of the week, I have to investigate a little more time figuring out if fine-tuning existing YOLOv8-OBB models would be better in our use case as opposed to training one from scratch. Moreover, I want to finish preparing the dataset for our use case (e.g. making the background white for the images, making transformations that affect the lighting of the images, and etc.)

Currently, everything is on schedule.