Team D3: The Self-driving human – Page 2 – Carnegie Mellon ECE Capstone, Spring 2025: Max Tang, William Shaw, Andrew Wang

April 13, 2025April 13, 2025

Andrew Wang’s Status Report 4/12/2025

This week, I worked on evaluating our models on a real world video feed, sampling images from an actual crosswalk that reflects our use case, and implementing a basic form of our feedback logic in the navigation submodule.

Using our models, I manually implemented visualizations of the bounding boxes identified by the model in the images sampled, and printed out any feedback prompts that would be relayed to the user on the top left of each image.

Visually inspecting these images, we see that not only do the bounding boxes look accurate to their identified obstacles, but that we are able to use their positional information in the image to relay feedback. For sample, in the second image, there is a person coming up behind the user on the right side, so the user is instructed to move to the left. Additionally, the inference speed was very quick; the average inference time per image was roughly 8ms for the non-quantized models, which is an acceptably quick turnaround for a base model. I haven’t tried running inference on the quantized models, but previous experimentation has indicated that it would likely be faster. As such, since the current models are already acceptably quick, I plan on testing the base models on the Jetson Nano first before I attempt to use to quantized models to maximize performance accuracy.

Currently, I am about on schedule with regards to ML model integration and evaluation. For this week, I plan on working with the staff and the rest of our group to refine the feedback logic given by the object detection model, as currently the logic is pretty rudimentary, often giving multiple prompts to adjust the user’s path when only one or even none are actually necessary. I also plan on integrating the text to speech functionality of our navigation submodule with our hardware if time allows. Additionally, I will need to port the object detection models onto the Jetson Nano and more rigorously evaluate the speed on our hardware, so that we can determine if more intense model compression will be necessary to make sure inference and feedback are as quick as we need it to be.

April 12, 2025April 12, 2025

Max Tang’s Status Report for 4/12/2025

Last week, we conducted a live test of the walk sign image classifier using the initial model. This involved running the model on the Jetson and feeding it images captured by the camera once every 2 seconds. By also saving the image that was being fed into the model, I could then later view the image on my computer and see exactly what the model was seeing. The performance during the live test was not great and the model frequently predicted “GO” and only rarely predicted “RED”. To try and fix this, we collected 400 more images using the same camera and retrained the model. I re-evaluated the performance with a test dataset and it seems to be better now, but I still need to conduct more field tests and then likely repeat this process if it’s not performing well enough. I also need to work on implementing the rest of the control logic for switching from the walk sign classification to the object detection process.

March 30, 2025

Andrew Wang’s Status Report: 3/29/2025

This week, I worked on quantizing the object detection models, such that we may have a smaller, more compressed model that will fit on the Jetson and also for faster inference times.

I am using TensorRT and Onnx packages to perform the quantization on the YOLO models, which I found online was the best method for quantizing ML models for Edge AI computing, due to the fact that this method results in the fastest speedups while shrinking the models appropriately. However, this process was pretty difficult as there are a lot of hardware specific issues I ran into in this process. Specifically, the process of quantizing using these packages involves at lot of under the hood implementation assumptions about how the layers are implemented and whether or not they are fuse-able, which is a key part of the process.

I did end up figuring it out, and ran some evaluations on the YOLOv12 model to see how performance changes with quantization:

Visually inspecting the prediction-labelled images and PR curve as well as the normalized confusion matrix, just like before, it generally appears that there isn’t much difference in performance with the base, un-quantized model, which is great news. This implies that we can compress our models in this way without sacrificing performance, which was the primary concern

Currently, I am about on schedule with regards to ML model development, but slightly behind on integration/testing the navigation components.

For this week, in tandem with the demos we are scheduled for, I plan on getting into the details of integration testing. I also would be interesting in benchmarking any inference speedups associated with the quantization, both in my development environment and on the Jetson itself in a formal testing setup.

March 30, 2025

Team Status Report for 3/29/2025

This week we prepared for our interim demo by working on integrating each independent submodule onto the Jetson Orin Nano. This involved working on uploading the models and necessary Python libraries, ensuring that the camera interfaces with the board, and checking that the video streaming works. The walk sign model is in a near finalized state, and the final step for integrating it is being able to take in a video frame as input from the camera. This will likely require a new submodule that solely handles sending the camera feed to the two vision models. This would make our code more modular, but the challenge is making sure that it works and can interface with the two models correctly. For example, the walk sign model can take in image inputs either from the file system given a path to the image, or it might be possible to use some Python interface with the camera to get the data instead.

WIth regards to the object detection model, we spent this week working on quantizing the model to determine the smallest size possible for the models to fit onto the Jetson, while also maintaining performance. This tradeoff will likely need to be explored in a bit more detail, but this allows for small models and quicker inference, which may be a factor in deciding which versions of the object detectors will be in the final product.

Concerning the hardware, work is continuing on the chest mount. We are now on the second revision, which finalized some dimensions with accurate (caliper) measurements and fixed some glaring comfort issues with strap angles. The design is not yet finalized, but prototypes are being made and laser cut. We are also working on calibrating / fine tuning the camera, to resolve some issues with red tint on the picture.

March 30, 2025March 30, 2025

William’s Status Report – 03/29/2025

This week, I mainly focused on the chest mount. I have finished my second iteration of the design. I plan to cut it out tommorow for testing. It is a no sew design, where I can slide on straps through some holes into the mount. I also took measurements of all the mounting holes and strap dimensions with a pair of calipers, so the dimensions of the design should be more or less accurate (at least to the point where everything fits and can be mounted.

With regards to peripherals. I am doing more calibration on the camera. It has a reddish tint, and I have been trying to apply a filter/patch to remove it. I am also testing out the different frame rates and resolutions to find out what gives us the best picture quality. The GPS and IMU sensors have taken a backseat, as there is not much to demo for them yet.

Regarding the schedule, I am more or less in sync with the Gantt Chart. I aim to continue working tomorrow (before the deadline) to cut out and test the second iteration of the chest mount. I also need to go and find some mounting hardware (standoff screws and screws).

March 30, 2025

Max Tang’s Status Report for 03/29/2025

This week I worked on getting the walk sign classification model uploaded onto the Jetson Orin Nano. The Jetson can natively run Python code and can run a Jetson-optimized version of TensorFlow, which we can download as a whl file and install on the board. The only Python libraries needed for the ResNet model are numpy and tensorflow, specifically keras, and both of these can run on the Jetson. The model itself can be saved as a .h5 model after being trained in Google Colab, and can be uploaded to the Jetson. Then the Python program can simply load the model with the help of keras and perform inferencing. The code itself is very simple: we simply import the libraries, load the model, and then it is ready to make predictions. The model itself is 98 MB, which fits comfortably onto the Jetson. The tensorflow library is less than 2 GB and numpy is around 20 MB, so they should fit as well.

The challenge now is getting the video frames as input. Currently we can get the camera working with the Jetson and stream the video on a connected monitor, but this is done through the command line and we need to figure out how to capture frames and give it to the model as input. Either the model connects to the camera through some interface, or some other submodule saves frames into the local file system for the model to then take as input, which is how it currently works.

The model is also now performing better on the same testing dataset from last week’s status report. I’m not sure how this happened, since I did not make any changes to the model’s architecture or the training dataset. However, the false negative rate decreased dramatically, as seen in this confusion matrix.

March 23, 2025March 23, 2025

Team Status Report for 3/22/2025

Enough progress has been made on each member’s respective subsystems for us to begin considering how to integrate all of the submodules onto the board together. The challenge is making sure that we can get all of the subsystems such as the walk sign classifier, obstacle detector, crosswalk navigator, speech module, etc. to work together. This will involve looking into how we can program the Jetson Orin Nano microcontroller and upload the machine learning models as well. The model performance is at a point where it is usable but could still use some finetuning, but it is more important at this point in time to make sure that we can actually run them on the board alongside the other modules.

With regards to the crosswalk navigation, we’ve begun implementing a basic feedback pipeline using text to speech libraries in python. No changes will be needed for right now, and we’ll likely need to wait until integration testing to determine if any further adjustments are needed in this submodule.

Concerning the hardware, work has begun on the design of the chest mount. Some images of the current design are included below. It will have straps at each of the four corners and will be worn like a chest harness. We plan to laser-cut the initial design out of wood as a proof of concept. The final version will be either ABS or wood, depending on which is more durable and easy to mount the devices onto. We will also likely add either a foam or TPU pad to the underside of the mount, as having a hard chestpiece would be uncomfortable for the user. With regards to the peripherals, the camera may have broken. This is system-critical, so will be the primary focus until it is resolved.

March 23, 2025

William’s Status Report – 03/22/2025

This week, I began work on designing the chest mount. This is being done in Fusion. I plan to laser-cut my initial design out of wood as a proof of concept. The final version will be made of either ABS or wood, depending on which is more durable and easy to mount the devices onto. I will also likely add either a foam or TPU pad to the underside of the mount, as having a hard chestpiece would be uncomfortable for the user. This will be a focus over the next week, as I want to wear the device. I also need to figure out how the harness straps will work, as I do not know how to sew.

With regards to peripherals. The camera may have broken. It no longer appears in “ls -l /dev/video*”, so I am unsure what happened. I plan to resolve this as soon as possible. I also got the VNC to work better by using a dummy DisplayPort plug, which tricks the Jetson Orin Nano into thinking it is plugged into a monitor. I opted to do this over running a virtual desktop, as I assumed it would be easier to turn off (just unplug it) and take less system resources. Another update is that I have confirmed that the Jetson Orin Nano works at full power over the USB-C PD to DC barrel adaptor!

Regarding the schedule, I am out of sync with the Gantt Chart. This is because some peripherals stopped working, and I must fix them. Next week, I hope to get the final dimensions sorted for the chest mount and try attaching the parts to the prototype. I have included some images of the current design (which is still being worked on) below.

March 22, 2025March 23, 2025

Andrew Wang’s Status Report: 3/22/2025

This week, I spent some time implementing the crosswalk navigation submodule. One of the main difficulties was using pyttsx3 for real-time feedback. While it offers offline text-to-speech capabilities, fine-tuning parameters such as speech speed, volume, and clarity will require extensive experimentation to ensure the audio cues were both immediate and comprehensible. Since we will be integrating all of the modules soon, I anticipate that this will be acted upon. I also had to spend some time familiarizing myself with text-to-speech libraries in python since I had never worked in this space before.

I also anticipate there being some effort required to optimize the speech generation, as the feedback for this particular submodule needs to be especially quick. Once again, we will address this as appropriate if it becomes a problem during integration and testing.

Currently, I am about on schedule as I have a preliminary version of the pipeline ready to go. Hopefully, we will be able to begin the integration process soon, so that we may have a functional product ready for our demo coming up in a few weeks.

This week, I will probably focus on further optimizing text-to-speech response time, refining the heading correction logic to accommodate natural walking patterns, and conducting real-world testing to validate performance under various conditions.

March 22, 2025

Max Tang’s Status Report for 3/22/2025

The walk sign image classification model is in a near-finalized state where I can begin to transition away from optimizing the model’s performance. Since last week, I performed some hyperparameter optimization and also tried adding some layers to the ResNet model to try and increase its performance. I tried changing the size of the dense linear layers, the number of epochs it was trained for, different activation functions, and additional linear and pooling layers. However, these did not seem to help as much as simply adding more training data that I’ve been continuously collecting. I also removed the validation dataset and divided its images amongst the training and testing datasets, since I did not find any real use to having a validation dataset and I benefited more from just having more data to train and test with. Current test accuracy is around 80%, which is not as high as desired. However, the good news is that most of the errors were when the model predicted “stop” when the image was “go”. This is much better than predicting “go” when the image is “stop”, and while I did not purposefully design the model to be more cautious when predicting “go” and this seemed to be a coincidence, it is something that I have realized that I could potentially add. This would not necessarily have to be a change to the model and could be done in some post-processing step instead.

The next step is implementing the logic that would take video input data and feed it into the model at some frequency and then return the result, using a sliding window for both the input and output. I plan to begin working on this next week.