Andrew Wang’s Status Report: 3/22/2025

This week, I spent some time implementing the crosswalk navigation submodule. One of the main difficulties was using pyttsx3 for real-time feedback. While it offers offline text-to-speech capabilities, fine-tuning parameters such as speech speed, volume, and clarity will require extensive experimentation to ensure the audio cues were both immediate and comprehensible. Since we will be integrating all of the modules soon, I anticipate that this will be acted upon. I also had to spend some time familiarizing myself with text-to-speech libraries in python since I had never worked in this space before.


I also anticipate there being some effort required to optimize the speech generation, as the feedback for this particular submodule needs to be especially quick. Once again, we will address this as appropriate if it becomes a problem during integration and testing.

Currently, I am about on schedule as I have a preliminary version of the pipeline ready to go. Hopefully, we will be able to begin the integration process soon, so that we may have a functional product ready for our demo coming up in a few weeks.

This week, I will probably focus on further optimizing text-to-speech response time, refining the heading correction logic to accommodate natural walking patterns, and conducting real-world testing to validate performance under various conditions.

Max Tang’s Status Report for 3/22/2025

The walk sign image classification model is in a near-finalized state where I can begin to transition away from optimizing the model’s performance. Since last week, I performed some hyperparameter optimization and also tried adding some layers to the ResNet model to try and increase its performance. I tried changing the size of the dense linear layers, the number of epochs it was trained for, different activation functions, and additional linear and pooling layers. However, these did not seem to help as much as simply adding more training data that I’ve been continuously collecting. I also removed the validation dataset and divided its images amongst the training and testing datasets, since I did not find any real use to having a validation dataset and I benefited more from just having more data to train and test with. Current test accuracy is around 80%, which is not as high as desired. However, the good news is that most of the errors were when the model predicted “stop” when the image was “go”. This is much better than predicting “go” when the image is “stop”, and while I did not purposefully design the model to be more cautious when predicting “go” and this seemed to be a coincidence, it is something that I have realized that I could potentially add. This would not necessarily have to be a change to the model and could be done in some post-processing step instead.

The next step is implementing the logic that would take video input data and feed it into the model at some frequency and then return the result, using a sliding window for both the input and output. I plan to begin working on this next week.

Andrew Wang’s Status Report: 3/15/2025

This week, I worked on debugging the fine-tuning portion of the object detection pipeline. Previously, the YOLOv8 model worked reasonably well out of the box on an out of distribution dataset, but my first attempts at implementing the fine-tuning portion weren’t successful for whatever reason, resulting in extremely poor performance.

 

Unfortunately, I wasn’t able to make much progress on this end. So far, I attempted switching the label mapping of the bdd100k dataset, trying to fine-tune different versions of the YOLOv8 models, and tuning the hyperparameters slightly, but they all had exactly the same outcome of the model performing extremely poorly on held out data. As to why this is happening, I am still very lost, and have had little luck figuring this out.

However, since we remain interested in having a small suite of object detection models to test, I decided to try to find some more YOLO variants to evaluate while the fine-tuning portion is being solved. Specifically, I decided to evaluate YOLOv12 and Baidu’s RT-DETR models, both compatible with my current pipeline for pedestrian object detection. The YOLOv12 model architecture introduces an attention mechanism for processing large receptive fields for objects more effectively, and the RT-DETR model takes inspiration from a Vision-Transformer architecture to efficiently process multiscale features in the input.

It appears that despite being more advanced/larger models, these models actually don’t do much better than the original YOLOv8 models I was working with. Here are the prediction confusion matrices for the YOLOv12 and RT-DETR models, respectively:

This suggests that it’s possible that these object detection models are hitting the limit of performance on this particular out of distribution dataset, and that testing these out in the real world might have similar performance across models as well.

Currently, I am a bit behind schedule as I was unable to fix the fine-tuning issues, and subsequently was not able to make much progress on the integration components with the navigation submodules.

For this week, I will temporarily shelve the fine-tuning implementation debugging in favor of implementing the transitions between the object detection and navigation submodules. Specifically, I plan on beginning to handle the miscellaneous code that will be required to pass control between our modules.

Team Status Report for 3/15/2025

One issue that we have not really spent too much thought on is whether or not the machine learning models will fit on the microcontroller. The microcontroller itself already has a decent amount of memory, but it is possible that we can add external storage as well. The models themselves, from initial research, should definitely fit. Quantizing the models will make them smaller and possibly improve inference speed, but this can sacrifice performance and might not be necessary from a model size standpoint. No changes have been made to the design recently, and next week we should begin to explore how we can upload the models to the microcontroller and being developing the main program that is going to run the models and manage the control flow.

With regards to hardware, testing for integrating the peripherals is still ongoing. We forecast that it will be complete by the end of next week. One piece of good news is that we have confirmed that the USB C PD to DC cable can successfully power the Jetson Orin Nano, which will allow us to make it portable. For some reason, our existing power bank (which was not purchased but had on hand) cannot supply the correct wattage, which is probably due to it being old. We will now need to find a suitable power bank to buy. Initial designs have also started for mounting the Jetson Orin Nano. Based on our last meeting, we have decided to create both a helmet mount and a chest mount. Once complete, we can test and compare both designs for user comfort and camera shake. The testing will determine which mounting system is used in the final design.

With regards to the object detection models, we continue to have trouble with fine-tuning out of the box models. To give a greater range of models to evaluate, we have decided to pause the fine-tuning implementation, and work on evaluating different YOLO models available, such as YOLOv12. No changes otherwise have been made to the software implementation

William’s Status Report – 03/15/2025

This week, I continued my work testing how to connect each peripheral device to the Jetson Orin Nano. There’s not much to update here, as I am still working on getting it done. I expect to have this completed by the end of next week. In particular, the I2C is a bit off, but I think it should just be a minor code fix that I need to resolve. The camera is in a good place, and I believe the data is being properly sent over the GStreamer pipeline. 

One easy test was the USB C PD to DC cable. I have verified that with a 45W wallwart, it is able to power the Jetson Orin Nano from a USB-C power source. However, I could not get the device to power on using the power bank that I had at home. This means I must find a suitable power bank that can successfully supply the voltage and has more modern PD standards. It is important to note that the Jetson Orin Nano powers on, but I do not know the wattage that it is running at over the adaptor. It is entirely possible that it is only running at a lower wattage setting, so I will find a way to verify that it works at full power next week. 

I have also begun designing two mounting systems for the device. One head mount and one chest mount. This is based on our weekly meeting, where we discussed that there may be some benefits to mounting the camera on the chest rather than the head. Once both designs are finalized and printed (or laser cut), we will test and compare both for user comfort and camera shake. 

 

Regarding the schedule, I am still about a week behind the Gantt chart. This is a hold-over from last week, where I forgot to account for spring break on the schedule. For next week, I plan to finalize the peripheral integration and to continue working on the mounting designs. 

Max Tang’s Status Report for 3/15/2025

Training the walk sign image classification model has had significant progress. The ResNet model is very easy to work with, and I have been addressing the initial overfitting from last week by training the model on a much more diverse dataset from multiple intersections around the city. I’ve developed the habit of always having my camera open when I get near intersections when I’m walking or commuting around Pittsburgh, and I have been able to get much more images. All I have to do is crop them and feed them into model. I have also been working on some hyperparameter optimization, such as the different layers and sizes. This has not really resulted in improved performance, but it’s possible that I can add more layers that will make it better. This will require some research to determine if layers like additional dense layers will help. By going into the open-source ResNet code, I can I think next week I want to have the model in a finalized state that I can being integrating it into the microcontroller. I think I will have to spend some time figuring out how to quantize the model to make it smaller next week.

Team Status Report for 3/8/2025

A change was made to the existing design – specifically, the machine learning model used in the walk sign subsystem was changed from a YOLO object detection model to a ResNet image classification model. This is because the subsystem needs be able to actually classify images as either containing a WALK sign or DON’T WALK sign, so an object detection model would not suffice. No costs were incurred by this change other than the time spent adding bounding boxes to the collected dataset. One risk is the performance of the walk sign image classification model when evaluated in the real world. It is possible that images captured by the camera when mounted on the helmet are different (blurrier, taller angle, etc.) than the images the model is trained on. This can definitely affect its performance, but now that the camera has arrived, we can begin testing this and adjust our dataset accordingly.

Part A (written by Max): The target demographic of our product is the visually impaired pedestrian population, but the accessibility of pedestrian crosswalks around the world varies greatly across countries, cities, and even neighborhoods within a single city. It is common to see sidewalks with tactile bumps, pedestrian signals that announce the WALK sign and the name of the street, and other accessibility features in densely populated downtowns. However, sidewalks in rural neighborhoods or less developed countries often do not have any of these features. The benefit of the Self-Driving Human is that it would work at any crosswalk that has the signal indicator. As long as the camera can detect the walk sign, then the helmet is able to run the walk sign classification phase and navigation phases without any issues. Another global factor is the different symbols used to indicate WALK and DON’T WALK. For example, Asian countries often use an image of a green man to indicate WALK, while U.S. crosswalks use a white man. This can only be solved by training the model on country-specific datasets, which might not be as readily available in some parts of the world.

Part B (written by William): The Self-Driving Human has the potential to influence cultural factors by reshaping how society views assistive technology for the visually impaired. In particular, our project would increase mobility and reduce reliance on caregivers for its users. This can lead to cultural benefits like increased participation in certain social events as the user gains more autonomy. Ideally, this would lead to greater inclusivity in city design and social interactions. Additionally, our project could promote a standardized form of audio-based navigation, influencing positive expectations about accessible infrastructure and design. We hope this pushes for broader adoption of assistive technology-driven solutions, which could result in the development of even more inclusive and accessible technologies.

 

Part C (written by Andrew): The smart hat for visually impaired pedestrians addresses a critical need for independent and safe navigation while keeping key environmental factors in mind. By utilizing computer vision and GPS-based obstacle detection, the device minimizes reliance on physical infrastructure such as paving and audio signals, which may be unavailable or poorly maintained in certain areas. This reduces the dependency on city-wide accessibility upgrades, making the solution more scalable and effective across diverse environments. Additionally, by incorporating on-device processing, the system reduces the need for constant cloud connectivity, thereby lowering energy consumption and emissions associated with remote data processing. Finally, by enabling visually impaired individuals to navigate their surroundings independently, the device supports inclusive urban mobility while addressing environmental sustainability in its design and implementation.

William Shaw’s Status Report: 3/8/2025

This week, the rest of the critical parts arrived. As such, I was able to move into the testing phase for the components. Since I am still in the earlier stages of testing and integrating the components, my focus has been primarily on the setup process and ensuring basic connectivity. First, for the camera (IMX219), it is connected via CSI-2, so the system detects it under “/dev/video*; v4l2-ctl –list-devices”, instead of with “lsusb”. I then made sure that v4l-utils and gstreamer were installed and updated to interact with the camera. More testing needs to be done to actually access the video feed, but the device is being detected. Second, for the IMU (BNO055), it communicates over I2C. As such, I use the command “i2cdetect” to check that the module is detected on the I2C bus of the Jetson Orin Nano. Next, I will use the smbus python library to read the raw sensor data. 

I also worked on configuring the Jetson Orin Nano for headless operation, ensuring that we can all access and interact with the system without needing an external monitor, keyboard, and mouse. Headless will be the operation mode for the project’s final phase, as we will not be able to have a monitor attached to the user. This is done through SSH’ing on my laptop. I also set up VNC (Virtual Network Computing) to get a visual remote desktop. Initially, I was experimenting with using Vino, but that has varying performance depending on the exact Jetson device being used. As such, I ended up swapping to x11vnc. This can be connected to using the built-in VNC client on a MacBook (“Screen Sharing” app). Separately, there were some initial issues with getting the wifi to work properly (due to some user privilege issues), but they have been resolved. 

Regarding the schedule, I am about a week behind the Gantt chart. This is because I did not consider that Spring Break was my “Week 5”, so I misjudged the actual dates. I plan to finish testing of each component by this week to get back on schedule. I also want to double check that the Jetson Orin Nano works on CMU-Secure/Device, as I have just been testing on my home network.

Andrew Wang’s Status Report: 3/8/2025

This week, I worked on fine-tuning the pretrained YOLOv8 models for better performance. Previously, the models worked reasonably well out of the box on an out of distribution dataset, so I was interested in fine-tuning it on this dataset to improve the robustness of the detection model.

 

Unfortunately, so far the fine-tuning does not appear to help much. My first few attempts at training the model on the new dataset resulted in the model not detecting any objects, and marking everything as a “background”. See below for the latest confusion matrix:

 

I’m personally a little confused as to why this is happening. I did verify that the out of the box model’s metrics that I generated for my last status report are reproducible, so I suspect that there might be a small issue with how I am retraining the model, which I am currently looking into.

Due to this unexpected issue, I am currently a bit behind schedule, as I had previously anticipated that I would be able to finish the fine tuning by this point in time. However, I anticipate that after resolving this issue, I will be back on track this week as the remaining action items for me are simply to integrate the model outputs with the rest of the components, which can be done regardless of if I have the new models ready or not. Additionally, I have implemented the necessary pipelines for our model evaluation and training for the most part, and am slightly ahead of schedule in that regard relative to our Gantt chart.

For this week, I hope to begin coordinating efforts to integrate the object detection models’ output to the navigation modules in the hardware, as well as resolving the current issues with the model fine-tuning. Specifically, I plan on beginning to handle the miscellaneous code that will be required to pass control between our modules.

Max Tang’s Status Report for 3/8/2025

This week I worked on training and tuning the walk sign image classification model. I made a major design change for this part of the system: instead of using a YOLO model that is trained for object detection, I decided to instead switch to an off-the-shelf ResNet model that I was able to fine tune with our own custom dataset. I initially thought that a YOLO model would be best since the system would need to find the walk sign signal box in an image and create a bounding box, but the issue is that this wouldn’t be able to classify the image as either a WALK or DON’T WALK. ResNet is just a convolutional neural network that can output labels, so as long as it is trained on enough high quality data, it should still be able to find the walk sign in an image. The training and evaluation is easily done in Google Colab:

 

More data needs to be collected to improve the model and increase its ability to generalize, as the current model is overfitting to the small dataset. Currently, finding high quality images of the WALK sign has been the main issue, as Google Maps tends to only have pictures of the DON’T WALK sign, and I can only take so many pictures of different WALK signs throughout the day. The good news is that retraining the model can be done very quickly, as the model is not that large so that it fits on the microcontroller. Now that I have the model finally working, I can focus my time next week on further data collection. Progress is still somewhat on schedule, but I will need to work on integrating this from my local machine onto the board soon.