Opalina’s Status Report 4/26

This week, I created a quantized YOLO model and tuned it in order to increase speed of the model on the Pi. I also rewrote an integration script, making optimizations with the OCR intervals and threading in order to reduce the inference time on each frame when running the end-to-end integration script on the Pi.

For unit tests, I ran the video processing script on a set of manually recorded videos (using printed signs) as well as pre-existing image datasets (117 airport signs) and reached an accuracy of approximately 92% with <100ms of preprocessing and inference time on the Mac. Eventually, the new script yielded a latency of <2s on the Raspberry Pi, which met our initial use case requirements.

Opalina’s Status Report 4/19

This week, I achieved significant milestones in my project. I successfully integrated the entire software subsystem, which included retraining the YOLO model to recognize “gate text”. Additionally, I developed a new video processing script and integrated it with the speech-to-text interface. A pivotal improvement was transitioning the speech-to-text functionality to leverage the Whisper model by OpenAI, which demonstrated superior performance compared to the initial VOSK model.

Throughout this semester, I have deepened my understanding of machine learning models, their implementation, and their application in real-world projects. This experience has provided me with valuable insights into how different disciplines within Electrical and Computer Engineering come together to create cohesive projects—a perspective that was less emphasized in my earlier coursework. My learning approach evolved to prioritize online documentation, videos, and research papers over traditional textbooks and lecture slides, enhancing my ability to tackle complex technical challenges effectively.

Team Status Report 04/12

The new camera arrived this week, which meant we had to start the process of hardware integration from scratch. The new AI hat, while easy to set up, is posing compilation issues with the custom ML models. While the text-to-speech and sign detection seems to be acceptably, this week, we hope to make more progress on integrating the camera with the software. Once we have a basic level of functionality, we hope to start user and battery tests to ensure that the device holds up to the initially outlined use-case requirements (accuracy, latency, power, depth perception etc)

Opalina’s Status Report 04/12

This week, I used new training and testing datasets to fine-tune the YOLO model that we previously used for our interim demo. The new model looks for an added set of features in order to reduce the amount of OCR passes needed on each image, subsequently reducing our latency. I ran a few standard tests in order to verify the functionality of this new model, and found a slight drop in accuracy, which I am currently working to reduce. In the coming week, I plan to build new custom test datasets in order to properly identify these points of error.

Opalina’s Status Report 3/29

This week, I managed to run and fully test YOLO and OpenCV in conjunction with OCR, to extract and interpret all the relevant information from an airport sign. This script simply needs to be run with the TTS component in order to complete the end-to-end software subsystem. In the coming weeks, we hope to use camera footage to run the models on the Pi.

Team Status Report 3/22

Currently, the individual components seem to be functioning adequately, including the OCR, YOLOv8 model, and the camera. However, the camera is working slower than initially anticipated, posing a potential risk of the model’s not running properly during integration. In the upcoming week, we aim to test this hypothesis and make changes to our models or equipment as necessary.

Opalina’s Status Report 3/22

This week, I managed to fine-tune the YOLO model while figuring out the semantics of the OpenCV, YOLOv8 and OCR integration. During the coming week, I hope to get the ML system fully functioning and at least partially integrated, ready to run on the Pi.

Opalina’s Status Report 3/15

The YOLO model is fully trained and functional on large airport datasets (with a variety of images). OCR is proving a little more difficult to integrate into thew software subsystem, but I hope to have that figured out by the end of this week. The only potential issue we see right now, is the model not running fast enough on the Pi, as local tests prove that it might be slower than anticipated.

Team Status Report 3/8

As a team, we are continuing to work on our individual subsystems and we are making sufficient progress. Currently, the most significant challenge we are facing is the lack of documentation on the eYs3D camera we are using, which could make it more difficult to integrate it with the Raspberry Pi and the ML models. Furthermore, the addition of OCR means that more time needs to be allocated for the ML component of the device.

Opalina’s Status Report 3/8

Over the last two weeks, I began training YOLOv8 on one of the online airport datasets. I also realized the need for Optical Character Recognition (to interpret words and numbers in addition to arrows) and delved into ways to implement and integrate it into the software subsystem. By next week, I hope to have a functional YOLO model for our purposes and robust implementation plans for OpenCV and OCR.