mhtang – Team D3: The Self-driving human

April 27, 2025April 27, 2025

Weekly Status Report for 4/26/25

The only remaining work to be done is to add the audio component, which simply involves connecting the earbuds via bluetooth and then calling some Python library like “subprocess” to connect the earbuds, and then “pygame” to play an audio file. We have the logic for when each audio file should be played already. There is also the compass component, which should also be quick.
The final design for the chest mount has been completed. The final revision adds a front plate, neoprene padding, and uses black and smoke grey acrylic.

Unit tests:
YOLO model (90 ms inference)

The performance was good enough and the speed was fast enough to not have to make any design changes. We also observe that in practice, the model does well enough to identify obstacles in the field of view:

For the initial verson of the navigation, we provided feedback to the user based on how close a detected obstacle was to the camera. However, in common scenarios where a person is just overtaking the user, this logic is insufficient, such as in the example image below during our testing at a crosswalk on campus:

In order to address this, we needed to update our navigation submodule slightly, allowing for a slightly more nuanced logic with regards to whether or not to alert the user to specific obstacles. We will need to include in our system wide testing as well, to make sure that the navigation behaves as anticipated as part of a larger system.

ResNet model (30 ms inference)

Again, the performance was good enough and the speed was fast enough to not have to change the model or size.

For the overall system test, no numbers have been collected as of yet. However, we plan to run it as follows:
First, we wear the device (without being blindfolded). Next, the we wait at a crosswalk, ensuring that the “Walk Sign” cue is only played when there is a valid walk signal. Once the light displays “WALK”, we begin crossing the road. We will ensure that the device functions properly for both a clear crossing with no deviation, a crossing with objects in the path, and a crossing where we deviate from the path. At the end of the cross road, we also ensure that it resets back to the original Walk Sign Detection model (our idle state).

April 27, 2025April 27, 2025

Max Tang Status Report for 4/26/25

This week I worked on writing the main program that would run the entire system, which involved integrating the various ML submodules. I have tested that both models work when running together. All we have left to do is add the audio component, which simply involves calling a Python library to connect the earbuds and play an audio file in certain if/else blocks in our code. Other than that, the device is pretty much in a finalized state.

April 20, 2025April 20, 2025

Max Tang Status Report 4/19/2025

This week I worked on the final integration of the walk sign image classification model into the board and docker environment. We first further optimized the walk sign model. I trained it on two different datasets: one that included a lot of non-crosswalk images labeled as “don’t walk”, and one that only included crosswalk images. I also implemented the code logic for transitioning from walk sign image classification to cross walk object detection. Initially, the model worked fine on the board, but uploading the other object detection model, we realized there were Python dependency conflicts. This is still currently an issue, as there have been challenges in trying to convert the tensorflow model to a pytorch model. One attempt was to change the walk sign model from a tensorflow model to a pytorch model. This involved first saving the model in the .tf format, and then converting it to .onnx using (python -m tf2onnx.convert –saved-model walksignmodel –output walksignmodel.onnx), and then converting that to .pt. However, this has had many other python dependency issues with the onnx libraries too. My plan for this weekend is to resolve this issue as soon as possible.

Update on 4/20/2025: The pytorch issue has been resolved, see team status report for update.

April 12, 2025April 12, 2025

Max Tang’s Status Report for 4/12/2025

Last week, we conducted a live test of the walk sign image classifier using the initial model. This involved running the model on the Jetson and feeding it images captured by the camera once every 2 seconds. By also saving the image that was being fed into the model, I could then later view the image on my computer and see exactly what the model was seeing. The performance during the live test was not great and the model frequently predicted “GO” and only rarely predicted “RED”. To try and fix this, we collected 400 more images using the same camera and retrained the model. I re-evaluated the performance with a test dataset and it seems to be better now, but I still need to conduct more field tests and then likely repeat this process if it’s not performing well enough. I also need to work on implementing the rest of the control logic for switching from the walk sign classification to the object detection process.

March 30, 2025

Team Status Report for 3/29/2025

This week we prepared for our interim demo by working on integrating each independent submodule onto the Jetson Orin Nano. This involved working on uploading the models and necessary Python libraries, ensuring that the camera interfaces with the board, and checking that the video streaming works. The walk sign model is in a near finalized state, and the final step for integrating it is being able to take in a video frame as input from the camera. This will likely require a new submodule that solely handles sending the camera feed to the two vision models. This would make our code more modular, but the challenge is making sure that it works and can interface with the two models correctly. For example, the walk sign model can take in image inputs either from the file system given a path to the image, or it might be possible to use some Python interface with the camera to get the data instead.

WIth regards to the object detection model, we spent this week working on quantizing the model to determine the smallest size possible for the models to fit onto the Jetson, while also maintaining performance. This tradeoff will likely need to be explored in a bit more detail, but this allows for small models and quicker inference, which may be a factor in deciding which versions of the object detectors will be in the final product.

Concerning the hardware, work is continuing on the chest mount. We are now on the second revision, which finalized some dimensions with accurate (caliper) measurements and fixed some glaring comfort issues with strap angles. The design is not yet finalized, but prototypes are being made and laser cut. We are also working on calibrating / fine tuning the camera, to resolve some issues with red tint on the picture.

March 30, 2025

Max Tang’s Status Report for 03/29/2025

This week I worked on getting the walk sign classification model uploaded onto the Jetson Orin Nano. The Jetson can natively run Python code and can run a Jetson-optimized version of TensorFlow, which we can download as a whl file and install on the board. The only Python libraries needed for the ResNet model are numpy and tensorflow, specifically keras, and both of these can run on the Jetson. The model itself can be saved as a .h5 model after being trained in Google Colab, and can be uploaded to the Jetson. Then the Python program can simply load the model with the help of keras and perform inferencing. The code itself is very simple: we simply import the libraries, load the model, and then it is ready to make predictions. The model itself is 98 MB, which fits comfortably onto the Jetson. The tensorflow library is less than 2 GB and numpy is around 20 MB, so they should fit as well.

The challenge now is getting the video frames as input. Currently we can get the camera working with the Jetson and stream the video on a connected monitor, but this is done through the command line and we need to figure out how to capture frames and give it to the model as input. Either the model connects to the camera through some interface, or some other submodule saves frames into the local file system for the model to then take as input, which is how it currently works.

The model is also now performing better on the same testing dataset from last week’s status report. I’m not sure how this happened, since I did not make any changes to the model’s architecture or the training dataset. However, the false negative rate decreased dramatically, as seen in this confusion matrix.

March 23, 2025March 23, 2025

Team Status Report for 3/22/2025

Enough progress has been made on each member’s respective subsystems for us to begin considering how to integrate all of the submodules onto the board together. The challenge is making sure that we can get all of the subsystems such as the walk sign classifier, obstacle detector, crosswalk navigator, speech module, etc. to work together. This will involve looking into how we can program the Jetson Orin Nano microcontroller and upload the machine learning models as well. The model performance is at a point where it is usable but could still use some finetuning, but it is more important at this point in time to make sure that we can actually run them on the board alongside the other modules.

With regards to the crosswalk navigation, we’ve begun implementing a basic feedback pipeline using text to speech libraries in python. No changes will be needed for right now, and we’ll likely need to wait until integration testing to determine if any further adjustments are needed in this submodule.

Concerning the hardware, work has begun on the design of the chest mount. Some images of the current design are included below. It will have straps at each of the four corners and will be worn like a chest harness. We plan to laser-cut the initial design out of wood as a proof of concept. The final version will be either ABS or wood, depending on which is more durable and easy to mount the devices onto. We will also likely add either a foam or TPU pad to the underside of the mount, as having a hard chestpiece would be uncomfortable for the user. With regards to the peripherals, the camera may have broken. This is system-critical, so will be the primary focus until it is resolved.

March 22, 2025

Max Tang’s Status Report for 3/22/2025

The walk sign image classification model is in a near-finalized state where I can begin to transition away from optimizing the model’s performance. Since last week, I performed some hyperparameter optimization and also tried adding some layers to the ResNet model to try and increase its performance. I tried changing the size of the dense linear layers, the number of epochs it was trained for, different activation functions, and additional linear and pooling layers. However, these did not seem to help as much as simply adding more training data that I’ve been continuously collecting. I also removed the validation dataset and divided its images amongst the training and testing datasets, since I did not find any real use to having a validation dataset and I benefited more from just having more data to train and test with. Current test accuracy is around 80%, which is not as high as desired. However, the good news is that most of the errors were when the model predicted “stop” when the image was “go”. This is much better than predicting “go” when the image is “stop”, and while I did not purposefully design the model to be more cautious when predicting “go” and this seemed to be a coincidence, it is something that I have realized that I could potentially add. This would not necessarily have to be a change to the model and could be done in some post-processing step instead.

The next step is implementing the logic that would take video input data and feed it into the model at some frequency and then return the result, using a sliding window for both the input and output. I plan to begin working on this next week.

March 16, 2025

Team Status Report for 3/15/2025

One issue that we have not really spent too much thought on is whether or not the machine learning models will fit on the microcontroller. The microcontroller itself already has a decent amount of memory, but it is possible that we can add external storage as well. The models themselves, from initial research, should definitely fit. Quantizing the models will make them smaller and possibly improve inference speed, but this can sacrifice performance and might not be necessary from a model size standpoint. No changes have been made to the design recently, and next week we should begin to explore how we can upload the models to the microcontroller and being developing the main program that is going to run the models and manage the control flow.

With regards to hardware, testing for integrating the peripherals is still ongoing. We forecast that it will be complete by the end of next week. One piece of good news is that we have confirmed that the USB C PD to DC cable can successfully power the Jetson Orin Nano, which will allow us to make it portable. For some reason, our existing power bank (which was not purchased but had on hand) cannot supply the correct wattage, which is probably due to it being old. We will now need to find a suitable power bank to buy. Initial designs have also started for mounting the Jetson Orin Nano. Based on our last meeting, we have decided to create both a helmet mount and a chest mount. Once complete, we can test and compare both designs for user comfort and camera shake. The testing will determine which mounting system is used in the final design.

With regards to the object detection models, we continue to have trouble with fine-tuning out of the box models. To give a greater range of models to evaluate, we have decided to pause the fine-tuning implementation, and work on evaluating different YOLO models available, such as YOLOv12. No changes otherwise have been made to the software implementation

March 16, 2025

Max Tang’s Status Report for 3/15/2025

Training the walk sign image classification model has had significant progress. The ResNet model is very easy to work with, and I have been addressing the initial overfitting from last week by training the model on a much more diverse dataset from multiple intersections around the city. I’ve developed the habit of always having my camera open when I get near intersections when I’m walking or commuting around Pittsburgh, and I have been able to get much more images. All I have to do is crop them and feed them into model. I have also been working on some hyperparameter optimization, such as the different layers and sizes. This has not really resulted in improved performance, but it’s possible that I can add more layers that will make it better. This will require some research to determine if layers like additional dense layers will help. By going into the open-source ResNet code, I can I think next week I want to have the model in a finalized state that I can being integrating it into the microcontroller. I think I will have to spend some time figuring out how to quantize the model to make it smaller next week.