Josh’s Status Report for 3/16/2024

Accomplishment:

For this week, I have worked on training Yolov9 with our own indoor object dataset as well as the analysis of the training result. I have also started working on implementing a DE feature. During the process, I have encountered several consideration points. First, the pre-trained Yolov9 model identifies indoor objects very well, even better than the trained model with our own dataset. 

This is the image of all objects identified by Yolov9-e.pt, which is the model with the highest precision rate among other weights in Yolov9. The image has been chosen randomly on the Internet for a simple test purpose, and the OR model successfully identifies most indoor objects. 

The following screenshot also shows the output of the Yolov9 using a laptop camera. As seen in the screenshot, the model successfully recognizes most indoor objects. 

On the other hand, the trained model needs some adjustments in the training method because it has shown a decrease in precision and increase in value loss when the model is trained for too long with high epochs. An ideal graph should look like as following:

The left 6 boxes show decrease of losses and the right 4 boxes show increase of precisions. However, our training result is as the following:

Evidently, the box_loss and class_loss for validation dataset has increased and the precision in the right 4 boxes has decreased after around 22 epochs. 

The image above represents the confusion matrix for the trained dataset. Based on the matrix, chair, keyboard, table, trash bin, and tv monitor have shown relatively high precision while book, bottle, cup, laptop, and window have shown relatively low precision. Because the potential obstacles are commonly chairs, tables, or trash bins, this trained model is showcasing a desired output. 

During the process of implementing the DE feature, I have found out that the open source of Yolov4 + DE feature, which I have planned to use as a reference, uses a reference image and distance as its tool to estimate distance of a specific object. In that project, pictures of a human and cell phone are used with known distances to estimate the distance of a human or a cellphone. I will be integrating this method for common indoor objects, such as a chair, table, door, TV, etc. However, a potential risk is that because the model will be estimating the distance based on the reference image, the actual distance may be incorrect. Furthermore, the size of indoor objects are usually different for different indoor settings, so it may yield inaccurate estimates. 

Progress

I have reached the milestone of training Yolov9 with our own dataset, but a bit of adjustment will be made to raise the accuracy. However, I have failed to integrate the DE feature to Yolov9 by this week. The implementation is expected to take longer to collect reference images and determine reference distances. 

Projected Deliverables

By next week, I am expecting to finish re-training the model with our dataset, test the OR model with the pre-trained and trained models to determine which model to go with. I will also finish implementing the DE feature, so that we can start integrating the components together using Nvidia Jetson. 

Leave a Reply

Your email address will not be published. Required fields are marked *