Team Status Reports – Team D3: The Self-driving human

April 27, 2025April 27, 2025

Weekly Status Report for 4/26/25

The only remaining work to be done is to add the audio component, which simply involves connecting the earbuds via bluetooth and then calling some Python library like “subprocess” to connect the earbuds, and then “pygame” to play an audio file. We have the logic for when each audio file should be played already. There is also the compass component, which should also be quick.
The final design for the chest mount has been completed. The final revision adds a front plate, neoprene padding, and uses black and smoke grey acrylic.

Unit tests:
YOLO model (90 ms inference)

The performance was good enough and the speed was fast enough to not have to make any design changes. We also observe that in practice, the model does well enough to identify obstacles in the field of view:

For the initial verson of the navigation, we provided feedback to the user based on how close a detected obstacle was to the camera. However, in common scenarios where a person is just overtaking the user, this logic is insufficient, such as in the example image below during our testing at a crosswalk on campus:

In order to address this, we needed to update our navigation submodule slightly, allowing for a slightly more nuanced logic with regards to whether or not to alert the user to specific obstacles. We will need to include in our system wide testing as well, to make sure that the navigation behaves as anticipated as part of a larger system.

ResNet model (30 ms inference)

Again, the performance was good enough and the speed was fast enough to not have to change the model or size.

For the overall system test, no numbers have been collected as of yet. However, we plan to run it as follows:
First, we wear the device (without being blindfolded). Next, the we wait at a crosswalk, ensuring that the “Walk Sign” cue is only played when there is a valid walk signal. Once the light displays “WALK”, we begin crossing the road. We will ensure that the device functions properly for both a clear crossing with no deviation, a crossing with objects in the path, and a crossing where we deviate from the path. At the end of the cross road, we also ensure that it resets back to the original Walk Sign Detection model (our idle state).

April 20, 2025April 20, 2025

Team Status Report for 4/19/2025

Last week, we managed to get both the object detection model and walk sign model onto the board. Due to make sure that the object detection model could use the GPU and decrease the inference time, we had to create a docker container for it to run in. However, due to Python dependency issues with tensorflow and pytorch, we are currently trying to change the walk sign image classification model to only use pytorch. We have tried a variety of methods such as converting a .h5 tensorflow model to a .pth pyotrch model, and also just rewriting everything using pytorch, but both have had issues. We are still currently exploring solutions to this problem.

Regarding hardware, we have finally mounted all our components to the chest mount. The power bank also arrived, which fits nicely into a fanny pack. We have tested running our system fully attached to a person, and it works as expected. The device is comfortable, and does not impede user motion or weigh too much. We are still improving the mount, but it’s in a very good state right now.

Update on 4/20/2025: Regarding the issue with converting the model from tensorflow to pytorch, we have successfully recreated the model in pytorch. We now have 3 different models trained on the much larger dataset collected from last week, using resnet 34, 101, and 152. The performance and confusion matrix for each are pretty similar. This is the confusion matrix for the resnet 152 model:

The test accuracy is 90.54%, but it’s possible that the class imbalance is skewing this accuracy. In any case, it’s better for the model to be more cautious when predicting “walk” than “don’t walk”, and we see that the error rate for the “don’t walk” class is very low.

April 13, 2025

Team Status Report for 4/12/2025

We conducted a live test of the walk sign image classifier using the actual camera and with the model running on the Jetson. The performance did not meet our specifications, so we are now working on improving that model, as detailed in Max’s status report for this week. The good news is that the camera quality is decent, and despite some initial issues with linking the video stream to the model, everything seems to work smoothly.

The object detection models all seem to work reasonably well on the images sampled from the live video feed. Not only do the bounding boxes accurately center on the relevant objects, but the video feed is clearly high-resolution enough for the model to work correctly. As such, it has been straightforward to visually confirm that the object detection component is working as well as it did during our extensive testing phase, and we can focus our efforts on refining the navigation submodule logic.

Concerning hardware, finishing touches for the chest mount are arriving this weekend. This includes standoff components and the padding. The primary material of the chest mount has been changed from 6mm plywood to 3mm acrylic. This decision was made for weight, flexibility, compatibility (with mounting hardware), and aesthetics. We also have obtained our power supply, which reads an estimated 11-hour battery life. This should be more than enough to cover the day-to-day usage of most users.

March 30, 2025

Team Status Report for 3/29/2025

This week we prepared for our interim demo by working on integrating each independent submodule onto the Jetson Orin Nano. This involved working on uploading the models and necessary Python libraries, ensuring that the camera interfaces with the board, and checking that the video streaming works. The walk sign model is in a near finalized state, and the final step for integrating it is being able to take in a video frame as input from the camera. This will likely require a new submodule that solely handles sending the camera feed to the two vision models. This would make our code more modular, but the challenge is making sure that it works and can interface with the two models correctly. For example, the walk sign model can take in image inputs either from the file system given a path to the image, or it might be possible to use some Python interface with the camera to get the data instead.

WIth regards to the object detection model, we spent this week working on quantizing the model to determine the smallest size possible for the models to fit onto the Jetson, while also maintaining performance. This tradeoff will likely need to be explored in a bit more detail, but this allows for small models and quicker inference, which may be a factor in deciding which versions of the object detectors will be in the final product.

Concerning the hardware, work is continuing on the chest mount. We are now on the second revision, which finalized some dimensions with accurate (caliper) measurements and fixed some glaring comfort issues with strap angles. The design is not yet finalized, but prototypes are being made and laser cut. We are also working on calibrating / fine tuning the camera, to resolve some issues with red tint on the picture.

March 23, 2025March 23, 2025

Team Status Report for 3/22/2025

Enough progress has been made on each member’s respective subsystems for us to begin considering how to integrate all of the submodules onto the board together. The challenge is making sure that we can get all of the subsystems such as the walk sign classifier, obstacle detector, crosswalk navigator, speech module, etc. to work together. This will involve looking into how we can program the Jetson Orin Nano microcontroller and upload the machine learning models as well. The model performance is at a point where it is usable but could still use some finetuning, but it is more important at this point in time to make sure that we can actually run them on the board alongside the other modules.

With regards to the crosswalk navigation, we’ve begun implementing a basic feedback pipeline using text to speech libraries in python. No changes will be needed for right now, and we’ll likely need to wait until integration testing to determine if any further adjustments are needed in this submodule.

Concerning the hardware, work has begun on the design of the chest mount. Some images of the current design are included below. It will have straps at each of the four corners and will be worn like a chest harness. We plan to laser-cut the initial design out of wood as a proof of concept. The final version will be either ABS or wood, depending on which is more durable and easy to mount the devices onto. We will also likely add either a foam or TPU pad to the underside of the mount, as having a hard chestpiece would be uncomfortable for the user. With regards to the peripherals, the camera may have broken. This is system-critical, so will be the primary focus until it is resolved.

March 16, 2025

Team Status Report for 3/15/2025

One issue that we have not really spent too much thought on is whether or not the machine learning models will fit on the microcontroller. The microcontroller itself already has a decent amount of memory, but it is possible that we can add external storage as well. The models themselves, from initial research, should definitely fit. Quantizing the models will make them smaller and possibly improve inference speed, but this can sacrifice performance and might not be necessary from a model size standpoint. No changes have been made to the design recently, and next week we should begin to explore how we can upload the models to the microcontroller and being developing the main program that is going to run the models and manage the control flow.

With regards to hardware, testing for integrating the peripherals is still ongoing. We forecast that it will be complete by the end of next week. One piece of good news is that we have confirmed that the USB C PD to DC cable can successfully power the Jetson Orin Nano, which will allow us to make it portable. For some reason, our existing power bank (which was not purchased but had on hand) cannot supply the correct wattage, which is probably due to it being old. We will now need to find a suitable power bank to buy. Initial designs have also started for mounting the Jetson Orin Nano. Based on our last meeting, we have decided to create both a helmet mount and a chest mount. Once complete, we can test and compare both designs for user comfort and camera shake. The testing will determine which mounting system is used in the final design.

With regards to the object detection models, we continue to have trouble with fine-tuning out of the box models. To give a greater range of models to evaluate, we have decided to pause the fine-tuning implementation, and work on evaluating different YOLO models available, such as YOLOv12. No changes otherwise have been made to the software implementation

March 9, 2025

Team Status Report for 3/8/2025

A change was made to the existing design – specifically, the machine learning model used in the walk sign subsystem was changed from a YOLO object detection model to a ResNet image classification model. This is because the subsystem needs be able to actually classify images as either containing a WALK sign or DON’T WALK sign, so an object detection model would not suffice. No costs were incurred by this change other than the time spent adding bounding boxes to the collected dataset. One risk is the performance of the walk sign image classification model when evaluated in the real world. It is possible that images captured by the camera when mounted on the helmet are different (blurrier, taller angle, etc.) than the images the model is trained on. This can definitely affect its performance, but now that the camera has arrived, we can begin testing this and adjust our dataset accordingly.

Part A (written by Max): The target demographic of our product is the visually impaired pedestrian population, but the accessibility of pedestrian crosswalks around the world varies greatly across countries, cities, and even neighborhoods within a single city. It is common to see sidewalks with tactile bumps, pedestrian signals that announce the WALK sign and the name of the street, and other accessibility features in densely populated downtowns. However, sidewalks in rural neighborhoods or less developed countries often do not have any of these features. The benefit of the Self-Driving Human is that it would work at any crosswalk that has the signal indicator. As long as the camera can detect the walk sign, then the helmet is able to run the walk sign classification phase and navigation phases without any issues. Another global factor is the different symbols used to indicate WALK and DON’T WALK. For example, Asian countries often use an image of a green man to indicate WALK, while U.S. crosswalks use a white man. This can only be solved by training the model on country-specific datasets, which might not be as readily available in some parts of the world.

Part B (written by William): The Self-Driving Human has the potential to influence cultural factors by reshaping how society views assistive technology for the visually impaired. In particular, our project would increase mobility and reduce reliance on caregivers for its users. This can lead to cultural benefits like increased participation in certain social events as the user gains more autonomy. Ideally, this would lead to greater inclusivity in city design and social interactions. Additionally, our project could promote a standardized form of audio-based navigation, influencing positive expectations about accessible infrastructure and design. We hope this pushes for broader adoption of assistive technology-driven solutions, which could result in the development of even more inclusive and accessible technologies.

Part C (written by Andrew): The smart hat for visually impaired pedestrians addresses a critical need for independent and safe navigation while keeping key environmental factors in mind. By utilizing computer vision and GPS-based obstacle detection, the device minimizes reliance on physical infrastructure such as paving and audio signals, which may be unavailable or poorly maintained in certain areas. This reduces the dependency on city-wide accessibility upgrades, making the solution more scalable and effective across diverse environments. Additionally, by incorporating on-device processing, the system reduces the need for constant cloud connectivity, thereby lowering energy consumption and emissions associated with remote data processing. Finally, by enabling visually impaired individuals to navigate their surroundings independently, the device supports inclusive urban mobility while addressing environmental sustainability in its design and implementation.

February 23, 2025February 23, 2025

Team Status Report for 2/22/25

The performance of the image classification and object detection models remain as the most significant risks, but these will only be revealed once we start actually testing them with data collected from our camera which has not arrived yet. For now, the contingency plan would be to switch models or perhaps make the scope of our input data or images that we want to classify smaller so that the models have an easier time with recognition. One change we made to the existing design was the camera we planned on using. We initially wanted a camera with a large field of view to try and capture as much of the environment as possible, but we realized that this would make the image size too large and make recognition harder.

With regards to the object detection model development, we plan to continue developing fine-tuned YOLO models. Initial testing of pre-trained models on out-of-distribution data (BDD100k validation dataset) yielded reasonable results, but we might want to consider leaning heavier on fine-tuned models for testing such that we have models trained on a wider variety of data. There is a significant risk that fine-tuning the existing models might not even be sufficient for accurate models when we integrate and test, however, and so our contingency plan is to continue collecting and processing more diverse datasets in an effort to boost performance.

In terms of hardware, we chose to delay ordering a sound card as we are considering using bone-conduction earphones for safety. They block less ambient noise and can be connected via Bluetooth. Testing for audio can be done through the DisplayPort connector, as the audio drivers should be identical regardless of which headphones we end up choosing. For power, we have ordered a USB-C PD to 15V 5A DC Barrel Jack converter. This fits into the power requirements while allowing us to use a PD Powerbank instead of a more esoteric Powerbank with a DC output.

February 16, 2025

Team Status Report for 2/15/2025

Currently, the most significant risk to the project is obtaining high-quality data to use for training our models. This is crucial, as no amount of hyperparameter optimization and tuning will overcome a lack of high-quality and well-labeled data. The images we require are rather specific, such as obstacles in a crosswalk from a pedestrian’s perspective and images of the pedestrian traffic light taken from the sidewalk. We are managing this risk by obtaining data from a variety of sources, such as online datasets, Google Images and Google Maps, and also real-world images. If this does not work, our contingency plan is to perhaps adjust the purpose of our model so that it does not require such specific data.

As outlined in William’s status report for this week, a few updates have been made to the hardware components. First, an additional IMU module is needed for accurate user heading. The FOV of the camera ordered was reduced from 175º (D) to 105º (D), as we were concerned about image distortion and extraneous data from having such a wide FOV. We chose 105º after some comparisons made using an actual camera to better visualize each FOV’s effective viewport. Having the Jetson Orin Nano on hand also allowed us to realize that additional components were needed to have audio output (no 3.5mm jack was present) and to make the power supply portable (the type-c port does not supply power to the board). These changes did not require any additional cost incurred by incompatible parts, as we have been very careful to ensure compatibility before actually ordering.

Our schedule remains essentially the same as before. For the hardware side, all the system’s critical components will arrive on time to stay on schedule. For the software side, our object detection model development is slightly behind schedule as mentioned in Andrew’s status report for 2/15. We anticipate having several versions of models ready for testing by the end of next week, and will be able to hopefully implement code to integrate it into our broader setup.

We will now go over the week 2 specific status report questions. A was written by William, B was written by Max and C was written by Andrew.

Part A. The Self-Driving Human is a project that is designed to address the safety and well-being of visually impaired pedestrians, both in a physiological and psychological sense. Crossing the street as a visually impaired person is both scary and dangerous. Traditional aids can be absent or inconsistent. Our project provides real-time audio guidance that helps the user cross the road safely, detect walk signals, avoid obstacles, and stay on the crosswalk. Because it is an independent navigation aid, it provides the user with self-sufficiency, as they are not reliant on crosswalk aids being maintained to cross the road. This self-sufficiency is an aspect of welfare, as the ability to move freely and confidently is a basic need. Ideally, our project works to create a more accessible and inclusive environment.

Part B. From a social perspective, the helmet will improve accessibility and inclusivity for visually impaired people and allow them to participate more fully in public life. There are some cities where pedestrian infrastructure is less friendly and accommodating, so this helmet would enable users to still cross streets safely. Economically, this helmet could reduce the need for expensive public infrastructure changes. Politically, solutions like this for the visually impaired can help increase awareness of the need for accessible infrastructure.

Part C. The traditional method of assisted street crossing/pedestrian navigation for the visually impaired involves expensive solutions such as guide dogs. While there is a significant supply of assistance, these methods might not be broadly accessible to consumers in need of them with regard to economic concerns. As such, we envision our project to serve as a first step in presenting an economically viable solution, able to be engineered with a concrete budget. As all of the navigation and feedback capabilities will be built directly into our device and will have been appropriately developed before porting them to the hardware, we anticipate that our (relatively) lightweight technology can increase the accessibility of visually impaired navigation assistance on a budget, as the development and distribution our project can be scaled with the availability of hardware, helping resolve consumption patterns.

February 9, 2025

Team Status Report for 2/8/2025

The most significant risks to the success of our project is the performance of the two image classification models and the integration of the hardware components. The accuracy of the image classification models need to be consistently high enough during real world testing in order for the helment to be able to transition between the two image classification and object detection states. The other issue is if the sensors we use will be compatible with our chosen microcontroller, the Jetson Nano. If, for example, the output of the camera is too high resolution and takes up too much memory, then this could be a problem for the limited memory on the microcontroller. These issues are still unclear since the ordered parts have not arrived yet, but the contingency plan is to simply try other parts such as lower resolution cameras that are still clear enough to be used for accurate image classification. No changes have been made to the existing design yet, as we have only just begun the implementation process and no issues have been discovered as of yet.