Josh’s Status Report for 2/24/2024

Accomplishment: For this week, I have accomplished training the object recognition model with an indoor object dataset I have found on the Internet. I have created a Google Collab, so that my teammates can also implement or view the progress of the training model and download the trained model after running all the steps. The model has changed from Yolov4 to Yolov5 because of the dependency issue in Yolov4. It uses darknet, which is now outdated and not maintained by the development team, so I have switched to Yolov5, which is developed by Ultralytics. This version is supported by a team that has a great maintainability, so this upgrade may be a huge advantage to our product.

However, it also means that I have to integrate a distance estimation feature from Yolov4 to Yolov5 model, which may take some time for development.

I have also collected a dataset relevant to indoor objects from the Internet. This dataset has 558 images for training, 52 images for validation, and 30 images for testing. 

The challenge towards training the OR model is that the dataset needs to be annotated as well, so it might take more time to collect greater number of datasets. My plan is to search more data online or perhaps use some tools for annotating the images. I have found that Kaggle has some useful datasets, so I will take a look into the website for more data.

Progress: I am on schedule because I have already collected some indoor object dataset and trained the model with it. However, I am planning on collecting more dataset to increase accuracy and need time to add a distance estimation feature to Yolov5 model. Before spring break ends, I am planning on finish training the OR model with distance estimation feature.

Projected Deliverables: By next week, I will collect around 500 more images with annotations and train the model with it. I will also begin implementing distance estimation feature to Yolov5 model by cross-referring the original source of Yolov4 + distance estimation feature.

Team Status Report for 02/17/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

  • Weight of the device:

We initially estimated a weight of 200 grams for the whole device which in retrospect was a vast underestimate. Our on-board computer (NVIDIA Jetson Nano) alone comes to 250 grams, along with some other heavy components such as the PCB/ Arduino, the rechargeable battery pack and other sensors. We also intend on 3D printing a case to improve the overall look and feel of the device. Given all of this, the total weight is going to be around 400-450g. We now run the risk of the device being too bulky and just uncomfortable and impractical for our user case. Although we will certainly make efforts along the way to reduce weight when we can, our backup plan is to offload the battery pack and potentially the Jetson to the waist of the user so that the weight is distributed and less of a disturbance for the user. 

  • Connection to peripherals:

We plan to connect the peripherals (buttons, sensor, and vibration motor) to the GPIO pins of the Jetson, with a custom PCB in between to manage the voltage and current levels. A risk with this approach is that custom PCBs take time to order, and there may not be enough time to redesign a PCB if there are bugs. We plan to manage this risk by first breadboarding the PCB circuit to ensure it is adequate for safely connecting the peripherals before we place the PCB order. Our contingency plan in case the PCB still has bugs is to replace the PCB with an Arduino, which will require us to switch to serial communication between the Jetson and Arduino and will cause us to reevaluate our power and weight requirements.

  • Recognition of partial frame of an object 

Although we are planning to find the dataset of indoor objects that includes some additions of partial images of objects, the recognition of a cropped image of an object due to close distance can be inaccurate. To mitigate this risk, we are planning to implement a history referral system that can track back to the history of recognized objects and determine the object if the accuracy is below a chosen threshold. Then, even when a user walks closer to the object to the point where the product cannot recognize the item, it can still produce a result by using the history. 

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

  • Switch from RPi to NVIDIA Jetson Nano:

In our previous report, we mentioned taking a network-based approach that offloads the bulk of the processing from the RPi to a server hosted on the cloud. This raised the issue of having the device be Wifi dependent and we quickly decided against taking that approach as we wanted our device to be accessible and easy to use, rather than an added chore for the user. To mitigate this, we did some research and found that switching from the RPi to the NVIDIA Jetson Nano as our on-board computer would make the most sense for the purposes of our project, as well as resolving the problem of overexerting the RPi or having to rely on a network connection to a server. The NVIDIA Jetson has higher performance, more powerful GPUs that make it better suited to run our object recognition model on board. Here is an updated block diagram:

As for changes in the cost, we have been able to get an NVIDIA Jetson Nano from the class inventory and so there is no additional cost. However, we have had to place a purchase order for a Jetson-compatible camera as the ones in the inventory were all taken. This was $28 out of our budget, which we believe we can definitely afford, and we don’t foresee any extra costs due to this switch.

  • Extra device control (from 1 button to 2 buttons):

Our design prior to this modification was such that the vibration module that alerts the user of objects in their way would be the default mode for our device, and that there would be one button that worked as follows: single-press for single object identification, double-press for continuous speech identification. As we got into discussing this implementation further, we realized that having the vibration  module be turned on  by default during the speech settings may be uncomfortable and possibly distracting for the user. To avoid risking overstimulation for the user, we decided to have both the vibration and speech modules be controllable via buttons, allowing the user to choose the combination of modes they wanted to use. This change is reflected in the above block diagram that now shows buttons A and B.

The added cost for this change should be fairly minimal as buttons cost around $5-10 and will greatly improve user experience.

  • Custom PCB:

Since we have switched to using the Jetson and plan to configure the GPIO pins for connecting peripherals, we now need to design and order a custom PCB for voltage conversion and current limiting. This change was necessary because the operating voltage of peripherals and the GPIO pin tolerances are different, and require a circuit in between to ensure safe operation without damaging any of the devices.

The added cost of this change is the cost of ordering a PCB as well as shipping costs associated with this. Since we are using a Jetson from ECE inventory, and the rest of our peripherals are fairly inexpensive, this should not significantly impact our remaining budget.

Provide an updated schedule if changes have occurred.

The hardware development schedule has changed slightly since we are now ordering a custom PCB. The plan for hardware development this week was to integrate the camera and sensor with the Jetson, but since these devices haven’t been delivered yet, we will focus on PCB design this week and will push hardware integration to the following week.

During testing of a pre-trained object recognition and distance estimation model, we have realized that the model only detects few objects that are irrelevant to indoor settings. Therefore, we have decided to train the model ourselves by using the dataset of common indoor objects. The workload of searching for a suitable dataset and training the model is added to the schedule, which has pushed back the object recognition model testing stage for around a week. 

 

Please write a paragraph or two describing how the product solution you are designing will meet a specified need.

Part A: … with respect to considerations of public health, safety or welfare.

Our product aims to aid visually impaired people from encountering an unnoticed danger that has not been detected by just using a cane. Not only does the product notifies the user what the object is but also alerts that there exists an obstacle right in front.  We are projecting the safety distance to be 2 meters, so that the user has time to avoid an obstacle in their own methods. 

If the product is successfully implemented, it can benefit blind people also in a  physiological sense. The people no longer need to be worried about running into an obstacle and getting hurt, which can significantly reduce their anxiety when walking in an unfamiliar environment. In addition, the user has an option to change a device setting to a manual option, in which the user manually presses a button to identify what the object in front of them is. This will alleviate the user’s stress of hearing the recognized objects every second. 

Part B: … with consideration of social factors. 

The visually impaired face significant challenges when it comes to indoor navigation, often relying on assistance from those around them or guide dogs. To address this, our goal is to have our device use technology to provide an intuitive and independent navigation experience. We use a combination of depth sensors, cameras, object recognition algorithms and speech synthesis to hopefully achieve this objective. The driving factor for this project is to improve inclusivity and accessibility in our society, aiming to empower individuals to participate freely in social activities and navigate public spaces with autonomy. Through our collaboration with the Library of Accessible Media, Pittsburgh, we also hope to involve our target audience in the developmental stages, as well as testing during the final stages of our project. 

Part C: … with consideration of economic factors.

Guide dogs are expensive to train and care for, and can cost tens of thousands of dollars for the visually impaired dog owner. Visually impaired people may also find it difficult to care for their guide dog, making them inaccessible options for many people. Our device aims to provide the services of guide dogs without the associated costs and care. Our device would reach a much lower price point and would be available for use immediately, while guide dogs require years of training. This makes indoor navigation aid more financially accessible to visually impaired people.

Josh’s Status Report for 2/17/2024

Accomplishment: 

This week, I have downloaded and tested several object recognition models that have distance estimation features to see how much implementation and add-on are needed to make a viable model. What I have realized about the two models I have first tested is that they both base off the object recognition model from the Yolov4 module. The techniques for ML are similar, and when going through the codes, I have found some space to adjust the model to be a viable model for our project.

During the testing, I have also realized that their pre-trained models detect a few objects (human, cellphone, etc.) that are irrelevant to indoor objects. Therefore, I am planning on finding a new dataset to train the model myself. Then, I have tested the Yolov7 model by installing NVIDIA CUDA and running the test to determine whether the model accurately detects a specific object that has been used to train the model. By comparing the accuracy based on the research papers on Yolov4 and Yolov7, I determined that using the Yolov4 model with distance estimation is sufficient to become our object recognition model. 

I have also updated the software flow diagram by adding more specific details on inputs, outputs, and features that go with the data processor. 

Progress:

I am on track on testing object recognition models, but because I have added a few more objectives in terms of training the chosen model with a suitable indoor object dataset, the planned date of testing the finalized object recognition model has been postponed by a week. 

Projected Deliverables:

By next Tuesday, I will be done searching for a dataset with common indoor objects. If time permits, I will include some partial images of respective objects to take consideration of identifying an object even if the object is too close. By February 26th, I will be done training the model with the dataset that I have collected on Tuesday. 

Meera’s Status Report for 2/17/24

Accomplishment: This week I researched using a Jetson vs RPi to help us decide which board to use. I also researched using the Jetson’s GPIO pins, looked through NANO pinout documentation, and found libraries for configuring the Jetson NANO’s pins. I also looked into existing projects that integrate a Jetson with sensor peripherals to get guidance on designing the custom PCB. I looked into specific devices for our audio output and vibration motor and settled on the CM108 board for audio conversion. Additionally, I called LAMP to get project feedback from visually impaired people and set up a meeting for this Tuesday 2/20.

Progress: My progress is on track this week since I set up the meeting with LAMP and looked into hardware components. My schedule for next week has changed slightly to accommodate circuit design, but I don’t expect this to throw off my overall progress on component integration since the hardware components we ordered have not arrived yet.

Projected Deliverables: This week I will be giving our design presentation as well as meeting with the LAMP advisory group on Tuesday, so I will work on preparing the slides and presentations. I will also begin designing and testing the voltage conversion circuit in preparation for our hardware components arriving. I also plan on placing an order for the audio converter and vibration motors we settled on.

Shakthi Angou’s Status Report for 2/17/2024

Accomplishment: Settled on the text-to-speech (TTS) engine to convert the OR model’s output into speech dictation should the user use that setting. The TTS engine we will be using is espeak, with the wrapper library pyttsx3 that will provide additional functionality such as adjusting the rate and selecting voice. Also looked into integration of the speech module into the overall device.

Progress: Given the design changes we made, to use the NVIDIA Jetson and forgoing the network-based approach, I believe I am back on track with my progress.

Projected Deliverables: I will next begin researching and developing ways in which to implement the vibration module of our device, which involves the ultrasonic depth sensors that will feed data into a program, which is eventually routed to the vibration motors. This program will act as a mediator between the depth sensors and the vibration motors, allowing us to set a specific threshold for detecting objects (2m for example), along with determining the frequency of the vibrations. I will likely be working alongside Meera, who leads the hardware for the project, towards implementing this module



Josh’s Status Report for 2/10/2024

Accomplishment: Throughout the week, I have researched three popular Object Recognition models. I have evaluated pros and cons of each model and created a checklist to determine which model fits the best to our project. Then, I have participated in a weekly progress meeting with the team to discuss the feedback from the proposal and the contingency plan of a significant risk to the product. I have also created a brief software module flow diagram to visualize the process. 

Progress: I am slightly behind the schedule due to the changes made to mitigate a significant risk of Raspberry Pi overheating. I will make sure to get back on track by spending more time on researching an OR model that also has a distance estimation feature. Testing the models by downloading and setting up the modules can also take a long time, so I will make sure to leave extra time for this objective. 

Projected Deliverables: By Monday, I will be done researching the possible models that have both OR module and distance estimation module. Then, by Thursday, I will be done testing the models on my laptop and reviewing them with the checklist I have created to decide which model to work with. 

Meera’s Status Report for 2/10/2024

Accomplishment: In the past week, I looked into hardware options for our device, specifically comparing TOF cameras to basic cameras. I learned that TOF cameras have less precision compared to a sensor dedicated solely for distance estimation, but this choice of hardware will make it easier for us to integrate object recognition with distance estimation, as opposed to our previous plan of integrating cameras with ultrasonic sensors. I also put together a Notion planning page for the team to keep track of our timeline, tasks, notes, and resources. We also had a team meeting to discuss feedback from our proposal and decide on any design changes, for which I created a new block diagram to reflect the new design. Lastly, I got in touch with a volunteer at LAMP to get the contact information of the volunteer coordinator.

Progress: This progress is on schedule with our initial timeline for research and ordering materials. However, since we decided to consider using the NVIDIA Jetson in place of the Raspberry Pi for better performance, I will be making more time to research single-board computers before our parts order goal of Wednesday 2/14.

Projected Deliverables: This week, I plan to continue looking into specific cameras and sensors so we can place an order by 2/14. I also plan to compare the performance and specifications of a Raspberry Pi vs. NVIDIA Jetson, so we can request one from the ECE inventory by 2/14 and begin using it by the end of the week. I will also contact LAMP’s volunteer coordinator on Monday 2/12 to set up a time for us to speak to visually impaired people and receive suggestions for our device’s features.

Shakthi Angou’s Status Report for 2/10/2024

Accomplishment: Early this week we had our proposal presentation so I spent some hours working on my delivery to ensure that I covered all the key points of our project and kept within time. We held our weekly meeting where we discussed the overall latency of our device and I proposed a plan to implement a network-based design so as to offload the processing from the RPi to a server.

 

Progress: I am a little behind on the research into how doable this network-based approach will be, but I did talk to a friend who has experience on similar work and believe we are capable of handling it. I plan on spending time tomorrow and Monday to flush out a solid execution plan along with system diagrams.

 

Projected Deliverables: By end of the week I hope to have some clarity on this new network-based design choice, along with deciding on some metrics such as how frequently the RPi will send packages to the server. I also will look up on how servers can be hosted over the cloud, as well as brainstorm the design of API endpoints to facilitate the communication.

 

Team Status Report for 02/10/2024

What are the most significant risks that could jeopardize the success of the project? How are these risks being managed? What contingency plans are ready?

RPi may overheat with constant use, and constant use also requires a large battery capacity, which increases device weight.

Plan: Have the RPi only be responsible for pre-processing the camera data and compiling it into a package with any necessary metadata to be sent to a server. This makes it so that the RPi is not doing the bulk of the processing and only acts as a channel to the server. The implementation will require us to set up a server on the cloud that runs our object recognition model and have it listen for incoming packages from the RPi. The RPi will periodically (every 1 second, for example) send an image to the server. This allows for real-time detection while minimizing the load on the RPi.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)? Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

There is a change made to the existing design regarding the change in the functionality of the ultrasonic sensor. To help reduce data latency when the user confronts an obstacle in the way, we plan on directly connecting the sensor to the vibration module rather than integrating the sensor with the recognition model. The cost will remain the same, but the purpose of an ultrasonic sensor has reduced from estimating the distance of an object and detecting an object to simply detecting that an object exists in front of the user. Then, there will be a change in plan to include the distance estimation model from a single source camera. Preferably, we will aim to integrate a single model that can take on both roles to relieve the workload of integrating two different models. 

Provide an updated schedule if changes have occurred.

Updated schedule to accommodate design changes and the need for further research:

2/10-2/16: Continue testing detection models to decide which ones to integrate

2/10-2/14: Continue research into camera modules, TOF cameras, and depth detection without ultrasonic sensors

2/12: Contact LAMP and set up a time to get feedback and suggestions for features

2/14: Order parts this week (RPi and potentially camera modules)