George Weekly – Team A5: FollowMe

George’s Status Report for Dec 9th:

As we have completed our project last weekend include testing, I spent most of time this week in final assignments where I wrote the planing script for our video (https://docs.google.com/document/d/1dcFfKrz7YlZylxGAez8Rn-iT8fti4Mi4Q5acaXn-y10/edit), recorded the video, wrote my part in the final paper, presented in the final presentation, and helped make the poster.

I am ahead of my schedule and I will present our project next Monday and Thursday and deliver our video and final paper next week.

George’s Status Report for Dec 2nd:

These 2 weeks I spent a lot of time on testing the system, debugging and improving the system along the way. Here is a list of things I have done (all my code changes can be seen in GitHub under commits with commit name ending or starting with “George” ):

Implement changes for latency logs and construct testing to demonstrate why latency is within 500ms requirement.
Implement new depth calculation functions in NumPy for faster speed (< 7.2ms).
Implement/Testing on different parallel calls structures on all my parts (calls on speaker to speak words, calls on depth calculation scripts) to improve latency => final approach is to create separate process for every time we calculate instruction (1 in 5 frames we received) and for every frame create 1 process for speaker to speak out all distances. This is tested to be the best combination of parallel structures for our system.
Implement whitelist and occurrence recording dictionary to increase accuracy. Whitelist – object labels can occur in indoor settings which we do relative good on. Only when the object label we get from model is in whitelist we will proceed to depth calculation. Occurrence recording dictionary is a dictionary that stores labels which occurs in last object detection results but have not been spoken out yet. The purpose for this is to ensure that only labels identified twice consecutively can be spoken out by the speaker. All these are essential because of the restrictions of the object detection models.
Implement “emergency script” which is incorporated into other scripts: rule out labels we get that has Lidar depth map distance < 0.5m or >10m (physically impossible depths measured by Lidar), labels within the same picture are combined where only the closest item’s distance will be announced.
Save speaker latency by reducing the number of speaker process spawned: 1 speaker instance for all labels within 1 picture frame.
Implement and use log saving, image saving, depth detail saving scripts to done real-life testing better.
Debugged speaker problem and deadlock problem (details can be seen on slack).
Conduct 6+ hours long testing of our system in locations include ECE corridor, Tech Spark and Hamerschlag indoor open spaces.
Write slides for final presentation powerpoint.

I think I have accomplished more than what is on my schedule. I am optimistic about my presentation next week.

For the next week, I will make my presentation and work on final poster and other assignments. In addition, I plan to investigate more on why the latency of our system is sometimes higher (few seconds) than what we design it to be.

George’s Status Report for Nov 18th:

This week I improved on the latency of the entire system by implementing fork/spawn for different processes and decreasing the amount of calculations needed to get depth of objects from depth map. With the current implementation, the system is able to speak out detection within about 2s (worse case scenario) latency and without losing frames for im.show(). The code for these parts can be seen in the commit labeled by George in Capstone-Final Repo.

I am on schedule with the project.

For the next week, I will start implement what I believe to be my last task of the project — emergency protocol. It will filter out unrealistic output from model and speaks out different warnings when environment is too complicated (potential danger), no objects detected, large changes in labelling, etc.

George’s Status Report for Nov 11th:

This week I have completed my integration of instruction given script and text to speech script into the rest of the system which my teammates have developed and have received some valid results (labels can be spoken out by speaker when the system is running on Jetson) in my testing of the entire system to demonstrate that the entire pipeline is working. As part of the interim demo, the result of my script are presented in a pre-recorded video and to professors and TAs.

I am on schedule with the project. In fact, due to the large amount of backup code which I developed ahead of time to accommodate any potential error I can think of in integration, the system integration part worked smoothly and I can advance to next task early.

For the next week, I will start thinking about thread creation to solve latency in script running which we have not thought of in the planning stage (the reason why this wasn’t discussed thoroughly is because earlier local test runs can handle picture stream really well) and think about the emergency script which we have talked about. I will try to come up with detailed plan/draft for both of them next week.

Individual Question Answer:

I have run the following task:

Unit testing for my instruction given script + speaker with pictures include only human objects detected.
Random integration testing for the entire project with my parts integrated with static camera placement in our normal meeting room.

In order to satisfy the use case requirements more, I will run more tests on my scripts in more dynamic situations with cameras moving in ECE hallways in Hamerschlag (use case requires movement of our user)and more diverse objects in the environment (use case requires objects recognition). Furthermore to satisfy the design requirement, I will try to run more test on object with longer/shorter distance from the camera (design requirement of object recognition distance). I believe these tests will consolidate the validity for my portion of the project.

George’s Status Report for Nov 4th:

This week I have finished the instruction given script with additions of different depth retrieving algorithms to achieve better depth readings, all the clean-up functions that can clean up data sent by ML model result and depth map, collection of testing pictures (colored camera pic, Lidar pic, depth image) that specifically for my testing function. The most difficult part of writing the instruction script turned out to be writing the clean-up functions as datatypes from ML model is still under development and many of the input formats contain extra information and noise in actual data that can affect the result. A simple text to speech function has also been implemented. The demo of work can be seen in the zoom video shared via slack.

I am on schedule with my part of the project.

In the next week, I hope to further test on my script with more data fed into the algorithm to see the result. It turns out that even with the label in-place, it could be hard sometimes to see the depth result.

George’s Status Report for Oct 28th:

Personally, I spent most of my time writing ethics assignment and discussing the ethics assignment (steps 3 and 4) with my teammates to continue our discussion on ethics. Other than the ethics assignment, I have came up with implementation logic for instruction given script and refined our design paper.

Due to unexpected sickness from Tues to Thurs this week, I am a little behind in schedule as of Oct 27, 2023. I will catch up to the project schedule by finishing a draft of my instruction given script by next Wednesday. I have met with my teammates regarding what should be input/output of my script and the technical design details of my script has been clarified.

In the next week, I hope to complete the instruction script and dive into Text To Speech script next.

George’s Status Report for Oct 21th:

Individually, I spent lots of time analyzing edge detection model which is a new direction we plan to pursue after receiving the feedback from last week’s report. As we have shifted from SLAM approach to Yolo/Edge detection approach, I modify our schedule accordingly. As a result of the research, I spent most of my time in writing the paper due this Saturday where I completed design trade studies section (Section 5) and section 8 of the paper.

I am on schedule with the project.

For week after the break, I plan to implement the edge detection model. The goal will be to have a draft for the edge detection script which can detect edges of objects with moderate noise filtering.

Regarding the individual question, I will look more into OpenCV which seems to be the biggest edge detection model available. It will help me detect edges of objects that can later be used to recognize the obstacles.

George’s Status Report for Oct 7th:

Individually, I did lots of research on SLAM model and designs of our system so that all software design questions can be addressed in the design presentation today. As a result of this research, I wrote this 6 page research file (https://docs.google.com/document/d/1ahaig9DHrYZnkUM2HZ7Y6NNv05TZThyRvFm7cm4zSRg/edit) individually and also help write/modify part of the PPT presented this week. Also, following last week’s report, I tested on the output size of SLAM models and have found out couple ways to reduce the size which includes lower number of pictures, lower number of pixels, post-image processing, etc. Given the current testing and ways to reduce bandwidth/latency, it is reasonable to believe that data transmission speed within our system will satisfy our use case requirement.

My progress is on schedule.

For next week, I plan to deliver on ways to produce meaningful output from SLAM model output. Basically, what we will let speaker say according to the obstacles data collected.

George’s Status Report for Sep 30th:

Individually I have created the initial draft of the design powerpoint going to be shown next week and helped make modifications to our design. Also, Jeffrey and I made a demo video (https://drive.google.com/drive/u/0/folders/1oKjSWw9k9neWMqx3_MsDvmwYJe7D-LS3) to show our design in real world settings where I directed and edited for things we should include in that video. Lastly, I worked on analyzing the video/lidar output we get from our previous work which including analyzing the size of outputs, the accuracy of distance measurement under light or dark condition, indoor, outdoor settings, crowded or less crowded situations. The analysis was done both through field experiment but also through inserting statements in our script to print different variables under different conditions.

I think my progress is on schedule as I already know what output I am getting and start testing on different conditions. For next week, I hope I could do more data editing such that I know for sure we have enough bandwidth to process the video/lidar outputs.

Individual Question: For my work, 15440 covers the message transmission part whereas in other parts of the project, 10301, CV and earlier courses covers the recognition model part and 240, 290 would cover everything needed for signals and hardware components.

George’s Status Report for Sep 23rd:

This week, the major tasks were to complete the peer review and gather feedback from classmates to improve on our project.

I help edit the powerpoint presented in the lectures and edit the speaker notes in the slide (https://docs.google.com/presentation/d/1lmwvtkJlKljWL75jdmAyMya4bLFtZ0-XIgx53w5oIZQ/edit#slide=id.g27f5c31f0a9_0_4). I made the team schedule with the task broke down and milestones set.

I am on schedule of my progress and at the end of this week, I will draft the message transmission protocol design and start implementing the algorithm.