mhtang – Page 2 – Team D3: The Self-driving human

March 9, 2025

Max Tang’s Status Report for 3/8/2025

This week I worked on training and tuning the walk sign image classification model. I made a major design change for this part of the system: instead of using a YOLO model that is trained for object detection, I decided to instead switch to an off-the-shelf ResNet model that I was able to fine tune with our own custom dataset. I initially thought that a YOLO model would be best since the system would need to find the walk sign signal box in an image and create a bounding box, but the issue is that this wouldn’t be able to classify the image as either a WALK or DON’T WALK. ResNet is just a convolutional neural network that can output labels, so as long as it is trained on enough high quality data, it should still be able to find the walk sign in an image. The training and evaluation is easily done in Google Colab:

More data needs to be collected to improve the model and increase its ability to generalize, as the current model is overfitting to the small dataset. Currently, finding high quality images of the WALK sign has been the main issue, as Google Maps tends to only have pictures of the DON’T WALK sign, and I can only take so many pictures of different WALK signs throughout the day. The good news is that retraining the model can be done very quickly, as the model is not that large so that it fits on the microcontroller. Now that I have the model finally working, I can focus my time next week on further data collection. Progress is still somewhat on schedule, but I will need to work on integrating this from my local machine onto the board soon.

February 23, 2025February 23, 2025

Team Status Report for 2/22/25

The performance of the image classification and object detection models remain as the most significant risks, but these will only be revealed once we start actually testing them with data collected from our camera which has not arrived yet. For now, the contingency plan would be to switch models or perhaps make the scope of our input data or images that we want to classify smaller so that the models have an easier time with recognition. One change we made to the existing design was the camera we planned on using. We initially wanted a camera with a large field of view to try and capture as much of the environment as possible, but we realized that this would make the image size too large and make recognition harder.

With regards to the object detection model development, we plan to continue developing fine-tuned YOLO models. Initial testing of pre-trained models on out-of-distribution data (BDD100k validation dataset) yielded reasonable results, but we might want to consider leaning heavier on fine-tuned models for testing such that we have models trained on a wider variety of data. There is a significant risk that fine-tuning the existing models might not even be sufficient for accurate models when we integrate and test, however, and so our contingency plan is to continue collecting and processing more diverse datasets in an effort to boost performance.

In terms of hardware, we chose to delay ordering a sound card as we are considering using bone-conduction earphones for safety. They block less ambient noise and can be connected via Bluetooth. Testing for audio can be done through the DisplayPort connector, as the audio drivers should be identical regardless of which headphones we end up choosing. For power, we have ordered a USB-C PD to 15V 5A DC Barrel Jack converter. This fits into the power requirements while allowing us to use a PD Powerbank instead of a more esoteric Powerbank with a DC output.

February 22, 2025

Max Tang’s Status Report for 2/22/15

This week I finished collecting all of the pedestrian traffic light data and also began the process of training the YOLOv8 image classification model. I explored collecting data through different ways but ultimately gathered most of my images from Google Earth. I took screenshots at various intersections in Pittsburgh and I varied the zoom distance and angle of each traffic light to get a diverse dataset. I also made sure to find different environmental conditions such as sunny intersections versus shadier intersections. Initially I explored other ways of collecting data such as taking pictures with my phone, but this proved to be too inefficient, and it was too difficult to get different weather conditions and going to different intersections with different background settings (buildings vs. nature) was too hard. I also explored using generative AI to produce images but the models I tried were unable to create realistic images. I’m sure there are models capable of doing so, but I decided against this route. I also found a few images from existing datasets that I added to my dataset.

The next step was to label and process my data. This involved categorizing each image as either “stop” or “go”, which was done manually. The next step was to prepare it for the YOLOv8 model, which involved putting bounding boxes around each pedestrian traffic light box in each image. I did this using Roboflow, a web application that let me easily add bounding boxes and export it in a format that can be directly inputted into YOLOv8. Then it was simply a matter of installing YOLOv8 and running it in a Jupyter Notebook.

Progress was slightly behind due to the initial difficulties with data collection, but I had updated my Gantt chart to reflect this and am on schedule now. Next week I plan on tuning the YOLOv8 model to try and increase the accuracy on my validation dataset, which so far needs improvement.

February 16, 2025

Max Tang’s Status Report for 2/15/2025

This week I worked on compiling data for training the walk sign detection model. The model’s performance is only as good as the data that it is trained on, so I felt that it was important to get this step right. I spent a lot of time searching online for datasets of pedestrian traffic lights. However, I encountered significant challenges in finding datasets specific to American pedestrian traffic signals, which typically use a white pedestrian symbol for “Walk” and a red hand for “Don’t Walk.” The majority of publicly available datasets featured Chinese pedestrian signals that use a red pedestrian and green pedestrian symbol, which are not suitable for this model. I decided to instead compile my own dataset by scraping images from Google as well as Google maps. I will also augment this dataset with real world images, which I will begin next week. This progress so far is on schedule, perhaps a little behind. The lack of existing American datasets set my back a little, so I will need to expedite the data collection. Next week I hope to have a fully labeled dataset with multiple angles and lighting situations. This should be ready for model training, which will be the next step in the walk sign detection section.

February 8, 2025February 16, 2025

Max Tang’s Status Report for 2/8/2025

This week I presented our group’s initial proposal presentation. The presentation went well, and I received many thought-provoking questions that have helped me realize that there were some aspects to our design that we have not considered, such as intersections that have multiple sidewalks. I began searching for suitable models that we can use to create our walk sign image classification model. One of these is an off-the-shelf YOLOv8 model that we can simply fine tune on walk sign images. Another potential solution I found is to gather as many images of walk signs as possible, as a combination of existing online datasets and self-taken images, and upload them to Edge Impulse. Then I can use Edge Impulse’s image classification model, which would be great for our project since Edge Impulse has a feature that lets you create quantized models, which use smaller data types for storing parameters and reduces the total memory required.

Progress is still on schedule. We allocated ourselves a large chunk of time for researching and making the model, and I believe that picking a suitable model at the beginning will help save time tuning and testing later. Next week I hope to be able to start the training and initial testing against validation datasets. This will give ample time for iteration if further improvements are required, which is very likely.