yoochanj – Team D2: SightMate

April 27, 2024April 28, 2024

Team Status Report for 4/27/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

We are currently using the software solution, which is the distance estimation feature in the OR model, for the proximity module. Although it has some decent accuracy with all measurements to be within +- 30cm, the uncertainty is around 20%, which may jeopardize the success of the use case requirement of the detection distance. This risk can be mitigated by using an Arduino board to connect to the Jetson and the ultrasonic sensor to get an accurate distance. However, this alternative will increase the weight of the product, which can go over the use case requirement of the weight of the device, and have latency in data transmission. It will also increase the development time to transfer the distance data to the Jetson. This is the tradeoff we still need to consider: accuracy vs. weight and latency.
After connecting the camera module and the OR model, we realized that there is a latency for every frame, possibly due to the recognition delay. Therefore, even if the camera is turned to a different object, the Jetson outputs the correct object around 5 seconds after the change. This can crucially jeopardize the success of the project because we had set the use case requirement to be less than 2.5 seconds of recognition delay. The risk can be mitigated by using an alternative method of capturing frames. A screen capture can be used instead of the video stream, which can potentially resolve the delay issue. However, the problem with this method is that the process of Jetson Nano running the program of a camera capture, transferring the information to the model, and deleting the history of the captured frame can take more time than the current delay. This alternative solution can also delay the product delivery due to more time necessary for the modification of the program.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

Besides the change from using an ultrasonic sensor integration to using the DE feature for the proximity module, no design change has been made.

Provide an updated schedule if changes have occurred.

Josh and Shakthi will work on integration and testing of the headless device. Meanwhile, Meera will work on the box for the device. Consequently, Josh, Meera, and Shakthi will conduct user testing and work on the final demo.

List all unit tests and overall system test carried out for experimentation of the system.

Testing	Metrics	Result
Object Recognition Model	> 70% on identifying an object	95% (38/40 images, 5 objects)
Distance Estimation Feature	± 30cm of actual object distance	Tests done on 4 different distances. Average of 21.5% uncertainty within ± 30cm
Text-to-speech Module	user-testing for surrounding sounds 20 trials each object	100% (20/20 person, 20/20 couch, 20/20 chair, 20/20 cat, 20/20 cellphone)
Vibration Module	> 95% accuracy on vibration	100% (20/20 on person, 20/20 on nothing)
Device Controls (buttons)	100% accuracy on controls	100% (20/20 on button A, 20/20 on button B)
Module Integration (weight)	< 450g on the overall product weight	192g (device) + 209g (battery) = 401g < 450g
Recognition Delay	< 2.5s to recognize an object	~8 seconds delay for 20 seconds testing. Frame delay due to the latency of the OR model

List any findings and design changes made from your analysis of test results and other data obtained from the experimentation.

Chose pre-trained model instead of trained model with indoor object dataset

Model	Real Objects	Detected	Falsely detected	Percentage (%)
Pre-trained	58	49	5	84.4
Trained	58	21	4	36.2

Chose Distance Estimation feature in the OR model instead of ultrasonic sensor
- Ultrasonic sensor does not work well with Jetson Nano
- DE feature rarely goes over ± 30cm, although some calibration is necessary

Actual (m)	Detected (m)	Off (m)
1.80	1.82	+ 0.02
1.20	0.89	– 0.31
0.20	0.38	+ 0.18
2.2	1.94	– 0.26

April 27, 2024May 1, 2024

Josh’s Status Report for 4/27/2024

Accomplishment (Updated on 04/29)

Prepared for the final presentation
Worked with Shakthi to deploy headless device settings by changing the startup application of the Ubuntu, so that the Jetson Nano automatically runs the main.py file, which has the OR model with speech module and button integration.
During the process, when trying out different permission change commands, the speech module broke down. As a result, we had to reboot the Jetson and reinstall all the necessary programs to run the speech module and OR module.
I reinstalled python 3.8.0 to the Jetson and opencv 4.8.0 with the GStreamer option enabled to allow video streaming. The same memory swap technique used previously was used to download a huge opencv build folder, which was about 8 GB.
(Update) Reduced the data latency of the OR model to an average of 1.88s by using a multithreading method to concurrently run the OR model and the video capture by the gstreamer from OpenCV. The OR model uses a frame to detect the closest object while the camera concurrently updates the frame. Although it faces a race condition by multiple threads accessing the global variable at the same time, the data fetched from the global variable would be from the previous instance, which is at most 1 frame behind real time. Not only it works with our use case requirement, but also this would not be noticeable to the user and hence does not affect their navigation experience. For this reason, we decided not to use any mutex or other lock methods for the global variable, which can potentially create a bottleneck and increase the latency of data transfer.

Progress

Fixed the speech module by rebooting the Jetson to a clean default setting.
Due to the audio error in the Jetson, the rebooting and reinstalling programs hindered our work for headless deployment. We are behind schedule in this step and will do unit testing once we get to finish this deployment.

Projected Deliverables

By next week, we will finish deploying the Jetson headless, so that we can test out the OR model by walking around an indoor environment.
By next week, we will conduct a user testing on the overall device functionality

April 20, 2024April 21, 2024

Josh’s Status Report for 4/20/2024

Accomplishment:

Enabled Gstreamer option on opencv-python on Jetson to allow real time capture. The opencv version 4.8.0 did not have a gstreamer option enabled, so a manual installation of the opencv with that option enabled was necessary. Because the opencv folder is too big, around 8 GB, I used a memory swap within the Jetson to temporarily increase space on Jetson. The build and install was run after the download.
Worked on integrating the OR Module, Speech Module and the Proximity module into the NVIDIA Jetson alongside Shakthi. The speech module and the proximity module were integrated within the loop of the OR model, so that for each frame, the Jetson will identify which button is pressed and which object is detected to output a desired result.
Added “cat” and “cellphone” as one of the indoor object options in the OR model and DE feature.
Tested OR model with a test file I have created. It stores the detected results and the real objects and compares them to yield the accuracy data. Tested 40 images composed of 6 cat images, 6 cellphone images, 10 chair images, 6 couch images, and 12 person images. Among them, 38 images were able to correctly detect the closest object. That makes the accuracy 95%. The incorrect images were due to the overlapping of several objects in one image. As an example, the model falsely identified the closest object when an image contained a cat right beneath a person.

Conducted unit testing on buttons and speech module with integration with the OR model. Pressed buttonA for vibration module and pressed buttonB for speech module consequently to test the functionality. Both modules had 100% accuracy.
Performed a distance estimation testing under four different conditions on detecting a person: first was to stay around 1.8m from the camera, second was to stay around 1.2m, third was to stay up close around 0.2m, and the last was to stay around 2.2m.
- The result was that 1.8m detected 1.82m, 1.20m detected 0.89m, 0.20m detected 0.38m, 2.2m detected 1.94m. On average, there is an uncertainty of 21.5%. Since the DE feature works based on the reference images, a little calibration is required. We will conduct more testings to find the most accurate calibration on the distance result.

Progress:

I made progress on successfully implementing the OR model to the Jetson Nano and allowing the camera to send real time data to the model for object detection.
We need to work on making the device headless, so that the device can be run without the monitor and wifi.
During the process of moving the device to headless, the speech module broke, so will need to work on the module again.

Projected Deliverables:

By next week, we will finish deploying the Jetson headless, so that we can test out the OR model by walking around an indoor environment.
By next week, we will conduct more testing on the Jetson OR to find the most accurate calibration for the distance of the closest object.
By next week, we will integrate the speech module again.

As you’ve designed, implemented and debugged your project, what new tools or new knowledge did you find it necessary to learn to be able to accomplish these tasks? What learning strategies did you use to acquire this new knowledge?

I learned a lot about machine learning frameworks and techniques by integrating an OR model and developing a distance estimation feature. I learned how to train an OR model with my own dataset with pytorch, modify training parameters such as epochs to yield different training weights, and display and compare detection results with a tensorboard. To learn this new knowledge, I allocated a lot of time researching by reading research papers, navigating through github communities, and scanning many tutorials. It was very challenging to find online resources that had the same issue as me because the systems are generally all different for each user. I also realized how important the relevancy of a post is because the technology upgrades rapidly, so I found many cases where the issue occurred due to the outdated sources.

Furthermore, I was able to get some experience on deploying modules on Jetson Nano. I learned a new skill of “memory swap”, which allowed me to temporarily increase the memory of the Jetson if I needed to import a huge module, such as opencv. I also realized how difficult it is to work with hardware modules and learned why we need to leave sufficient slack time towards the end of the project. As an example, the detection rate of the OR model was much slower than when it was run on the computer. If I did not spend time modifying the weight of the model during the slack time, I would not have been able to deploy the module and yield the detection result with less latency. Likewise, through multiple occasions where deployment of the model did not function as what I would have expected, I acquired this learning strategy.

April 6, 2024

Josh’s Status Update for 4/6/2024

Accomplishment:

Implemented a python program that tests the OR model + DE feature against test images. It retrieves the closest object detected in the image and verifies accuracy by the respective image filename. As an example, if the filename is “person_test5.jpg”, the actual closest object is a person in the image. In the program, it filters out “person” from the filename and compares it with the detected closest object.
The program was run against chair (8), couch (6), person (5) images. The result came out as 100% accurate.
Started working on deploying the OR module to Jetson. I transferred python files and reference images from my computer to Jetson.

Progress:

I failed to meet the schedule due to the system setting of Nvidia Jetson. Importing the torch module on Jetson is taking more time than expected due to unexpected errors, so the schedule is postponed for a few days. Installing appropriate modules to Jetson is the critical component of the project, so I will make this as the highest priority and attempt to resolve the issue as fast as possible.

Projected Deliverables:

By next week, I will finish deploying the OR model to Jetson, so that we can start testing the interaction between several subsystems.

Now that you have some portions of your project built, and entering into the verification and validation phase of your project, provide a comprehensive update on what tests you have run or are planning to run. In particular, how will you analyze the anticipated measured results to verify your contribution to the project meets the engineering design requirements or the use case requirements?

I have implemented a python program that tests the OR model + DE feature against test images. It retrieves the closest object detected in the image and verifies accuracy by the respective image filename. As an example, if the filename is “person_test5.jpg”, the actual closest object is a person in the image. In the program, it filters out “person” from the filename and compares it with the detected closest object. The program was run against chair (8), couch (6), person (5) images. The result came out as 100% accurate, which is far greater than the use case requirement of 70% accuracy. If time permits, I am planning to include more indoor objects, so that the model can cover a wider range of objects while maintaining high accuracy.

After the deployment of the OR model to Jetson, I am planning to use the same test file to run a testing on images taken from the Jetson camera and produce an accuracy report. In this case, since we are sending the images to the model in real time from the Jetson, we would not be able to rename the file in the format of the actual object. Therefore, I will instead use live outputs of detected closest objects from the Jetson and manually check whether the detection is accurate.

March 30, 2024April 1, 2024

Team Status Report for 3/30/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

This week we connected the camera module to the Jetson and captured a few images. The camera lens causes a slight distortion to the image, and the images are lower resolution compared to the laptop camera we have been using to test the OR model. A risk associated with this is that we may experience lower accuracy of the model, and we may have to mitigate this by processing the images before sending them to the OR model.
The accuracy of the DE of a detected object is a risk that we are currently facing. Although we can successfully determine which object is closer to the camera, the numerical value of the distance in meters is inaccurate. This is due to the difference in the width of the chair in the lab and from the reference image. This inaccuracy does not impact the output of the model as much, but it is an undeniable factor to the accuracy of the DE feature. We are planning to mitigate this risk by taking the reference images of the objects that we will be using for the test environment. In this way, we are able to make the width of the respective indoor objects the same (i.e. all chairs have the same width, all sofas have the same width, etc.).

There was one change from the design of the system. The OR model will use Yolov5 instead of Yolov4. This change is to increase the accuracy and improve the data latency of the model. Although it took more development time than necessary to program the DE feature, the result will have a better yield.

Provide an updated schedule if changes have occurred.

Few more relevant objects will be added to the DE feature. The deployment of the OR model on Jetson will begin next week.

Here is our update Gantt Chart as of Interim Demo (4/1/2024):

Gantt Chart – Timeline 1

March 30, 2024

Josh’s Status Update for 3/30/2024

Accomplishment:

For this week, I have successfully implemented the Yolov5 OR model + DE feature. I used classes for the easier extraction of the reference images and filtered the detected objects so that it only outputs several indoor objects, such as a couch, person, mobile phone, and chair. I took several reference images from my laptop camera from a known distance and compared them with the images from online pairwise to determine whether the OR model successfully recognizes specific indoor objects and outputs relative distance from one another (which object is closer to the camera). After several instances of successful output, I used the image captured from the Jetson camera and ran it in my model. The following has the image taken and the output from the model.

As shown in the image, it successfully outputs several chairs and their distances from the camera. However, although the order of distance from the camera makes sense, the numerical value of the estimation is too high. This is because the chair from the reference image has a different width from the chair from the test image. To resolve this problem, I am planning on taking the reference images of the objects that will be used in the test environment to increase the precision of the DE feature.

Progress

I have successfully added relevant objects (coach, chair) to the DE feature and had some testing done. However, it is also important to test the OR model with more images from the Jetson camera to ensure the accuracy. I will need to do more testing with images and videos taken from a Jetson camera.

Projected Deliverables

For next week, I will finish deploying the OR model to Jetson. At the same time, I will include more relevant objects, such as a table, to ensure sufficient range of indoor objects. The test results with Jetson camera will be documented for the final report.

March 23, 2024March 23, 2024

Team Status Report for 3/23/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

Yolov4 OR model: A step back from yolov9 to yolov4 has been decided to incorporate the DE feature due to yolov5~9’s incapability of manipulating the detected output data unlike yolov4. Because training the model is not possible for yolov4, a risk of not being able to focus on indoor objects may arise. However, after some research, a pre-trained model uses MS COCO, which has 330K images with 1.5 million object instances, 80 object categories. This is a much better annotated dataset compared to what we could find online, which has 640 images. Therefore, it makes sense to use the pre-trained model. It is also possible to filter out the specific outdoor objects, such as a car or a bus, in the DE feature, so we can still focus on indoor objects.
PCB Assembly: Due to some delay in the receiving of transistors for the PCB assembly, we have been set back by 1 week as Meera had to wait on fixing the PCB for a first-round of testing. We aim to get a first version of the PCB ready in the first half of the coming week but this 1-week delay will certainly be cutting into the time we allocated to testing and modifying the current design and re-ordering a new PCB if necessary. The contingency plan is now to get the first iteration of the PCB ready as quickly as possible and work on testing.

A change from yolov9 to yolov4 has been decided in our software module. Considering that the test setting will be in a well-lit indoor environment with few indoor objects, it is expected that the accuracy drop from yolov9 to yolov4 will not significantly impact the project.

Provide an updated schedule if changes have occurred.

There has been a change in my (Shakthi’s) work schedule. I decided to move working on the speech module till after the implementation of the vibration module as the vibration module had a shorter end-to-end data flow and required a lot of the same processing of data as the speech module does. Now that I’ve completed the first iteration of the vibration module, I will be going back to finish up the speech module and work on its integration with the rest of the system.

There also has been a change in the work schedule for the OR model. Because we are using Yolov4 with the DE feature, the schedule has been adjusted accordingly. The testing stage has been pushed back by a week.

Due to the delays in getting the transistors, the hardware development schedule has also been pushed back until we get the transistors this week.

March 23, 2024

Josh Joung’s Status Report for 03/23/2024

Accomplishment:

I have worked on trying to integrate a DE feature to Yolov9, but it was unsuccessful. I have noticed that the recent versions of the OR model restrict our abilities to manipulate with the detected output, which prevents us from using the data to detect the distance.

Therefore, I decided to step back to Yolov4 with the DE feature considering the time constraint on the project. The current open source library only has the DE feature for a person and a mobile phone. To test my understanding of the source, I have added several reference images and measured the distance of the width of a sofa to detect the distance. As shown in the screenshot of the running model, the distance is shown along with the detected object.

The accuracy of the distance will be improved by taking a reference image from a constant known value, which will be the distance between the object and the camera.

Progress

Because we are stepping back to Yolov4 with the DE feature, I am catching up with the progress. I still need to add reference images of several more indoor objects to test out the DE feature.

Projected Deliverables

By next week, I will add bench, cat, dog, backpack, handbag, suitcase, bottle, wine glass, cup, fork, knife, spoon, bowl, chair, sofa, potted plant, bed, dining table, tv monitor, laptop, mouse, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, scissors to the DE feature. I will also finish up the testing and begin to work on deploying the model to Nvidia Jetson. It will be a long week to prepare for the interim demo and ensure that the model runs in the Jetson well.

March 16, 2024March 16, 2024

Team Status Report for 3/16/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

The major risks remain the same as previous weeks: the weight of the device, the PCB connection between Jetson and peripherals, the identification of partial frames of objects, and the OR model version.

Another risk is the accuracy of the DE feature. Because it uses a reference image and known size to estimate the distance of an identified object, if the model misidentifies a certain obstacle, it will produce an incorrect distance and lead to an incorrect nearest distance. Then, the system will output a wrong obstacle to the user. This risk will be mitigated by raising the accuracy of the model with a better training method. Few adjustments with epochs and image resolutions will be made to output the greatest precision.

The only potential change that can be made to the design of the system is that if the pre-trained model identifies a batch of test objects better than the trained model with our own dataset, the pre-trained Yolov9-e.pt will be used for the weight of the OR model.

Provide an updated schedule if changes have occurred.

Since we are still waiting for our order of transistors for the PCB, and have not yet ordered the audio converter, the hardware development schedule has been pushed back slightly:

Another update to the schedule is that the integration of the DE feature has been pushed back for another week due to its unexpected complexity and learning curve.

March 16, 2024

Josh’s Status Report for 3/16/2024

Accomplishment:

For this week, I have worked on training Yolov9 with our own indoor object dataset as well as the analysis of the training result. I have also started working on implementing a DE feature. During the process, I have encountered several consideration points. First, the pre-trained Yolov9 model identifies indoor objects very well, even better than the trained model with our own dataset.

This is the image of all objects identified by Yolov9-e.pt, which is the model with the highest precision rate among other weights in Yolov9. The image has been chosen randomly on the Internet for a simple test purpose, and the OR model successfully identifies most indoor objects.

The following screenshot also shows the output of the Yolov9 using a laptop camera. As seen in the screenshot, the model successfully recognizes most indoor objects.

On the other hand, the trained model needs some adjustments in the training method because it has shown a decrease in precision and increase in value loss when the model is trained for too long with high epochs. An ideal graph should look like as following:

The left 6 boxes show decrease of losses and the right 4 boxes show increase of precisions. However, our training result is as the following:

Evidently, the box_loss and class_loss for validation dataset has increased and the precision in the right 4 boxes has decreased after around 22 epochs.

The image above represents the confusion matrix for the trained dataset. Based on the matrix, chair, keyboard, table, trash bin, and tv monitor have shown relatively high precision while book, bottle, cup, laptop, and window have shown relatively low precision. Because the potential obstacles are commonly chairs, tables, or trash bins, this trained model is showcasing a desired output.

During the process of implementing the DE feature, I have found out that the open source of Yolov4 + DE feature, which I have planned to use as a reference, uses a reference image and distance as its tool to estimate distance of a specific object. In that project, pictures of a human and cell phone are used with known distances to estimate the distance of a human or a cellphone. I will be integrating this method for common indoor objects, such as a chair, table, door, TV, etc. However, a potential risk is that because the model will be estimating the distance based on the reference image, the actual distance may be incorrect. Furthermore, the size of indoor objects are usually different for different indoor settings, so it may yield inaccurate estimates.

Progress

I have reached the milestone of training Yolov9 with our own dataset, but a bit of adjustment will be made to raise the accuracy. However, I have failed to integrate the DE feature to Yolov9 by this week. The implementation is expected to take longer to collect reference images and determine reference distances.

Projected Deliverables

By next week, I am expecting to finish re-training the model with our dataset, test the OR model with the pre-trained and trained models to determine which model to go with. I will also finish implementing the DE feature, so that we can start integrating the components together using Nvidia Jetson.