Here is our design review report (also viewable from this page):
Andrew Wang’s Status Report – 02/22/2025
This week, I was able to begin the evaluation and fine-tuning of a few out of the box YOLO object detection models. More specifically, I used the YOLOv8x, which is a large, high performance model trained on the COCO dataset.
For evaluation, we were advised to be wary of the robustness of the object detection models with regards to its performance on out-of-distribution data, as previous teams have run into difficulty when trying to use the models in a real world setting. Since the validation metrics of the model on the COCO dataset are already available online, so I decided to use the validation set of the BDD100k dataset to determine the level of performance decay on a out-of-distribution dataset to mimic performance in a real world setting.
So far, it appears that the out of the box model does reasonably well on the new dataset out of distribution. I first generated a confusion matrix to examine how well the model does on each class. Note that our evaluation dataset only contains the first 10 labels of the YOLO model, and so only the top left square of the matrix should be considered in our evaluation:
It appears that the model mistakingly assigns a “background” label to some images that should be been classified as another item on the road, which is especially troublesome for our use case. Besides this, the accuracy appears somewhat reasonable, with some notable off-target predictions. I also generated a precision-recall across the different classes:
It appears that the model struggles most with identifying traffic lights and trains. However, in our use case of crossing the road, these two objects are definitely less important to detect in comparison to the other categories, so I’m not personally too worried about this. As a whole, the mAP metrics across the other labels seem reasonable compared to the reported mAP metrics of the same models on the COCO dataset. Considering that these models weren’t trained on this new BDD100k dataset, I’m cautiously optimistic that they could perform well in our testing as is, even without extensive fine-tuning.
Finally, I generated a few example images with the model predictions overlaid to visually depict what the model is doing. Here is an example:
The top picture are the images with the reference labels,
and the bottom picture are the same images with our model predictions overlaid. On the top row, the second image to the left stood out to me, since our model detected trains where there weren’t any. To me, this might be an interesting dive point into why our model does so poorly with regards to detecting trains, although given that we have established that trains aren’t as important in our use case, we might not need to do a detailed analysis if time is tight.
With regards to progress, I believe that I am about on track as per our Gantt chart; I have been able to complete preliminary evaluation of the object detection models, and I have also started implementing a fine-tuning pipeline, in order to incorporate more datasets into the out of the box models we are currently using.
Next week, I plan on moving into the second part of my deliverables; writing out a pipeline to handle the outputs from our model with regards to navigation. I plan on brainstorming how to make proper use of the detection model inputs, as well as how they should be integrated into the larger navigation module that we have planned. I also plan on gathering some more datasets such that I can make use of the fine-tuning pipeline I already have implemented to develop even better object detection models, such that we have a wider array of options when we are ready to our integrated project.
Team Status Report for 2/22/25
The performance of the image classification and object detection models remain as the most significant risks, but these will only be revealed once we start actually testing them with data collected from our camera which has not arrived yet. For now, the contingency plan would be to switch models or perhaps make the scope of our input data or images that we want to classify smaller so that the models have an easier time with recognition. One change we made to the existing design was the camera we planned on using. We initially wanted a camera with a large field of view to try and capture as much of the environment as possible, but we realized that this would make the image size too large and make recognition harder.
With regards to the object detection model development, we plan to continue developing fine-tuned YOLO models. Initial testing of pre-trained models on out-of-distribution data (BDD100k validation dataset) yielded reasonable results, but we might want to consider leaning heavier on fine-tuned models for testing such that we have models trained on a wider variety of data. There is a significant risk that fine-tuning the existing models might not even be sufficient for accurate models when we integrate and test, however, and so our contingency plan is to continue collecting and processing more diverse datasets in an effort to boost performance.
In terms of hardware, we chose to delay ordering a sound card as we are considering using bone-conduction earphones for safety. They block less ambient noise and can be connected via Bluetooth. Testing for audio can be done through the DisplayPort connector, as the audio drivers should be identical regardless of which headphones we end up choosing. For power, we have ordered a USB-C PD to 15V 5A DC Barrel Jack converter. This fits into the power requirements while allowing us to use a PD Powerbank instead of a more esoteric Powerbank with a DC output.
William Shaw’s Status Report – 02/22/2025
This week, I ordered the other essential parts. This included the GPS module and a USB-C to DC barrel jack adaptor. Neither of these are technically system-critical as of right now, but ordering them together saved on shipping costs from Adafruit. I opted to wait on ordering the USB Sound Card, as the driver used for audio output should not change, and we may opt to use wireless bone conductive earphones instead of the on-ear headphones. This could be a safer alternative, as they block the least ambient noise.
Regarding the Jetson Orin Nano, I spent this week setting it up in preparation for future tasks. This included updating the board’s firmware, loading a new boot image with JetPack SDK, and setting up Ubuntu. After completing these preliminary steps, I moved on to installing the dependencies we would need for future tasks. Many of these were included in the JetPack SDK, so it took less effort than expected. I also began trying to run a few demos like Ollama on the Jetson. Ideally, this makes me more familiar with the platform, which should make later work smoother.
In terms of schedule, I am right on track. The parts should arrive in a few days, which is on schedule for me to begin testing. Next week, I plan to complete much of the testing for interfacing the hardware to the Jetson. I will focus on the camera and the IMU first, as these are our most system-critical components. I also want to begin drafting our overall mounting mechanism.
Max Tang’s Status Report for 2/22/15
This week I finished collecting all of the pedestrian traffic light data and also began the process of training the YOLOv8 image classification model. I explored collecting data through different ways but ultimately gathered most of my images from Google Earth. I took screenshots at various intersections in Pittsburgh and I varied the zoom distance and angle of each traffic light to get a diverse dataset. I also made sure to find different environmental conditions such as sunny intersections versus shadier intersections. Initially I explored other ways of collecting data such as taking pictures with my phone, but this proved to be too inefficient, and it was too difficult to get different weather conditions and going to different intersections with different background settings (buildings vs. nature) was too hard. I also explored using generative AI to produce images but the models I tried were unable to create realistic images. I’m sure there are models capable of doing so, but I decided against this route. I also found a few images from existing datasets that I added to my dataset.
The next step was to label and process my data. This involved categorizing each image as either “stop” or “go”, which was done manually. The next step was to prepare it for the YOLOv8 model, which involved putting bounding boxes around each pedestrian traffic light box in each image. I did this using Roboflow, a web application that let me easily add bounding boxes and export it in a format that can be directly inputted into YOLOv8. Then it was simply a matter of installing YOLOv8 and running it in a Jupyter Notebook.
Progress was slightly behind due to the initial difficulties with data collection, but I had updated my Gantt chart to reflect this and am on schedule now. Next week I plan on tuning the YOLOv8 model to try and increase the accuracy on my validation dataset, which so far needs improvement.
Team Status Report for 2/15/2025
Currently, the most significant risk to the project is obtaining high-quality data to use for training our models. This is crucial, as no amount of hyperparameter optimization and tuning will overcome a lack of high-quality and well-labeled data. The images we require are rather specific, such as obstacles in a crosswalk from a pedestrian’s perspective and images of the pedestrian traffic light taken from the sidewalk. We are managing this risk by obtaining data from a variety of sources, such as online datasets, Google Images and Google Maps, and also real-world images. If this does not work, our contingency plan is to perhaps adjust the purpose of our model so that it does not require such specific data.
As outlined in William’s status report for this week, a few updates have been made to the hardware components. First, an additional IMU module is needed for accurate user heading. The FOV of the camera ordered was reduced from 175º (D) to 105º (D), as we were concerned about image distortion and extraneous data from having such a wide FOV. We chose 105º after some comparisons made using an actual camera to better visualize each FOV’s effective viewport. Having the Jetson Orin Nano on hand also allowed us to realize that additional components were needed to have audio output (no 3.5mm jack was present) and to make the power supply portable (the type-c port does not supply power to the board). These changes did not require any additional cost incurred by incompatible parts, as we have been very careful to ensure compatibility before actually ordering.
Our schedule remains essentially the same as before. For the hardware side, all the system’s critical components will arrive on time to stay on schedule. For the software side, our object detection model development is slightly behind schedule as mentioned in Andrew’s status report for 2/15. We anticipate having several versions of models ready for testing by the end of next week, and will be able to hopefully implement code to integrate it into our broader setup.
We will now go over the week 2 specific status report questions. A was written by William, B was written by Max and C was written by Andrew.
Part A. The Self-Driving Human is a project that is designed to address the safety and well-being of visually impaired pedestrians, both in a physiological and psychological sense. Crossing the street as a visually impaired person is both scary and dangerous. Traditional aids can be absent or inconsistent. Our project provides real-time audio guidance that helps the user cross the road safely, detect walk signals, avoid obstacles, and stay on the crosswalk. Because it is an independent navigation aid, it provides the user with self-sufficiency, as they are not reliant on crosswalk aids being maintained to cross the road. This self-sufficiency is an aspect of welfare, as the ability to move freely and confidently is a basic need. Ideally, our project works to create a more accessible and inclusive environment.
Part B. From a social perspective, the helmet will improve accessibility and inclusivity for visually impaired people and allow them to participate more fully in public life. There are some cities where pedestrian infrastructure is less friendly and accommodating, so this helmet would enable users to still cross streets safely. Economically, this helmet could reduce the need for expensive public infrastructure changes. Politically, solutions like this for the visually impaired can help increase awareness of the need for accessible infrastructure.
Part C. The traditional method of assisted street crossing/pedestrian navigation for the visually impaired involves expensive solutions such as guide dogs. While there is a significant supply of assistance, these methods might not be broadly accessible to consumers in need of them with regard to economic concerns. As such, we envision our project to serve as a first step in presenting an economically viable solution, able to be engineered with a concrete budget. As all of the navigation and feedback capabilities will be built directly into our device and will have been appropriately developed before porting them to the hardware, we anticipate that our (relatively) lightweight technology can increase the accessibility of visually impaired navigation assistance on a budget, as the development and distribution our project can be scaled with the availability of hardware, helping resolve consumption patterns.
Andrew Wang’s Status Report for 2/15/2025
This week, I was able to gain access to a new computing cluster with higher amounts of storage and GPU availability late into the week. As such, I began downloading an open source objection detection dataset, BD100K, from Kaggle onto the cluster for evaluation/fine-tuning. After all of the images were downloaded (the version I downloaded had 120,000+ images), I was able to start working on the implementation of the evaluation/fine-tuning pipeline, although this is still a work in progress.
With regards to schedule, I believe that I am slightly behind schedule. Due to some issues with gaining access to the cluster and the download time required to fetch a large dataset, I did not anticipate not being able to work on this until the later half of the week. I would have liked to have finished the evaluation/fine-tuning implementation by this week, and so I anticipate having to put in a bit of extra work this week to catch up and have a few different versions of the model ready to export to our Jetson Nano.
By the end of this week, I hope to have completed the evaluation/fine-tuning pipelines. More specifically, I would like to have concrete results for evaluating a few out of the box YOLO models with regards to accuracy and other metrics, in addition to hopefully have fine-tuned a few models for evaluation.
Max Tang’s Status Report for 2/15/2025
This week I worked on compiling data for training the walk sign detection model. The model’s performance is only as good as the data that it is trained on, so I felt that it was important to get this step right. I spent a lot of time searching online for datasets of pedestrian traffic lights. However, I encountered significant challenges in finding datasets specific to American pedestrian traffic signals, which typically use a white pedestrian symbol for “Walk” and a red hand for “Don’t Walk.” The majority of publicly available datasets featured Chinese pedestrian signals that use a red pedestrian and green pedestrian symbol, which are not suitable for this model. I decided to instead compile my own dataset by scraping images from Google as well as Google maps. I will also augment this dataset with real world images, which I will begin next week. This progress so far is on schedule, perhaps a little behind. The lack of existing American datasets set my back a little, so I will need to expedite the data collection. Next week I hope to have a fully labeled dataset with multiple angles and lighting situations. This should be ready for model training, which will be the next step in the walk sign detection section.
William Shaw’s Status Report for 02/15/2025
This week, I primarily focused on ensuring that all our parts were in order. This was a much deeper dive into each hardware component than before, and I checked for things like compatibility, interfacing, and outputs. This caused me to revise a few of the prior hardware choices and realize that more needed to be added. In particular, the GPS module we planned to use did not give accurate heading data for stationary subjects. As such, I added a new IMU module to act as a compass, the Adafruit 9-DOF IMU Fusion Breakout BNO055. The module allows us to get accurate user heading without movement up to ±2°, while automatically compensating for head-tilt and sensor noise.
Another update regarding parts is the audio output and power supply. I had previously thought a Type C Power delivery power bank could power the Jetson board. However, for the Jetson Orin Nano Development Kit, the Type C port is data only and does not power the board. As such, I am looking into alternative power supplies/options for when we make the system portable. Additionally, the board does not come with a 3.5mm audio jack. While there is audio over the HDMI port, that is not a viable solution since we will not connect the board to a display. As such, I need to find a compatible USB sound card for the board.
So far, I have the Jetson Orin Nano on hand. The Arducam IMX219 (camera) and BNO055 IMU (Compass) are being shipped. These are necessary for us to begin testing the navigation system of our project, so we should be able to start testing actual inputs when they arrive (assuming that interfacing goes smoothly). There are a few remaining components to order (speakers, soundcard, portable power supply), but they are not system-critical for the work that needs to be done so far. I plan to order these components by the following weekly report. I am on schedule so far. By next week, I hope to have placed orders for all the components. I also aim to successfully interface the IMU and camera to the Jetson Orin Nano board.
Team Status Report for 2/8/2025
The most significant risks to the success of our project is the performance of the two image classification models and the integration of the hardware components. The accuracy of the image classification models need to be consistently high enough during real world testing in order for the helment to be able to transition between the two image classification and object detection states. The other issue is if the sensors we use will be compatible with our chosen microcontroller, the Jetson Nano. If, for example, the output of the camera is too high resolution and takes up too much memory, then this could be a problem for the limited memory on the microcontroller. These issues are still unclear since the ordered parts have not arrived yet, but the contingency plan is to simply try other parts such as lower resolution cameras that are still clear enough to be used for accurate image classification. No changes have been made to the existing design yet, as we have only just begun the implementation process and no issues have been discovered as of yet.