Team’s Status Report for 4/12

The most significant risks that could jeopardize the success of our project is the error while overlaying the coordinate system given by the UWB sensors and the coordinate system that we have from the segmentation model. Since obstacles are in one coordinate system and user localization is in another, incorrect overlaying could cause our user to bump into obstacles or never be guided to the right location. This could definitely jeopardize our project working, and is the last main milestone we have to accomplish. To manage these risks, we came up with a calibration system that could recalibrate the UWB coordinate system and scale it to match the occupancy matrix in which we run path planning on. We would have to work on it further to ensure it is fully robust. Another slight risk is that the Jetson does not have enough processing power to run the segmentation model that identifies our environment as obstacles or free space. To manage this risk, we could offload the compute to the cloud or on a separate laptop. Since the SAM model is not exactly tiny or meant for edge devices, the cloud or a working laptop is definitely a better alternative.

Thus, the only possible change that we might make to our existing design is offloading the compute from the Jetson to the cloud or a laptop. This change is necessary because it is quite evident that the Jetson does not have enough processing power or RAM to run segmentation models. We could try troubleshooting this a bit more or consider alternatives, but this is a possible change we might make.

To verify our project starting from the beginning of the pipeline, we would first test the accuracy of the segmentation model. So far we have inputted around 5-6 example images from the bird’s eye view and analyzed how well it was able to segment free space and obstacles. It has done an accurate job so far, but once we are in the verification stage we would empirically test it on 10 different environments and make sure the accuracy is above 95%. For latency of the pipeline on segmentation, we already know it will be approximately 30 seconds. Thus, we would have to slightly adjust our use case requirements to a fixed environment but still monitor the user in real time. The rest of the pipeline has minimal latency (downsampling algorithm, A* path finding algorithm, socket communication between Raspberry Pi and laptop, etc), so the segmentation model is the only bottleneck. We would run it in different environments and note the latency to verify our use case requirements. For correct navigation of the user to the destination, the main concern is the overlay between the coordinate system given by the UWB sensors and the coordinate system given by the segmentation model. Getting the haptics to vibrate in the right direction should not be a huge issue, but the main issue is most likely having the person be at the correct location indicated by the UWB sensors. We can measure this empirically by noting the error between where the person is located and where our system thinks he / she is located. This metric could help guide us to the navigation accuracy metric.

Talay’s Status Report for 4/12

This week, I worked on setting up the Arducam wide-lens camera on the Jetson, setting up the keypad to work on the Raspberry Pi, setting up the haptics sensors to work on the Raspberry Pi, and setting up socket communication under the same wifi subnet between the Raspberry Pi and my laptop. The first task I did was setting up the Arducam wide-lens camera by installing the camera drivers on the Jetson. The process was quite smooth compared to the OV2311 Stereo Camera since this wide-lens camera relied on the USB protocol which is a lot more robust.

From the image you can see the quality is decent and the camera has a 102 FOV.

Next I worked on communication between the Raspberry Pi and the laptop using sockets. I set up a virtual environment on the Raspberry Pi and installed all the necessary dependencies to set this up. After this I set up the keypad on the Raspberry Pi using GPIO pins. Once the Raspberry Pi detects that a key is pressed, it sends data to the laptop which runs the A* path planning module and has the occupancy matrix visualized. The keypad is able to select different destinations on the occupancy matrix via network connection.

Once the laptop runs path-planning and determines the next move for the user, it sends the directions back to the user (Raspberry Pi) through the same socket connection. I then set up the haptic vibration motors which is also on the Raspberry Pi. I had to use different GPIO pins to power the vibration motors since some pins were already used for the keypad.

The Python code on the Raspberry Pi was able to detect which direction the user had to move in next and vibrate the corresponding haptic vibration motors. Our haptic vibration motors had slightly short wires, so we might need an extension for the final belt design.

I believe my progress is on schedule. Currently, the next main milestone for the team is to integrate the UWB sensor localization with the existing occupancy matrix. This is our last moving part in terms of project functionality. The last milestone for this project would be to integrate the Arducam wide-lens camera instead of the phone camera.

Next week, I hope to work with Kevin on integrating the UWB sensor localization with the occupancy matrix generated from the segmentation model and down-sampling. Currently, the user location is simulated through mouse presses, but we want it to be updated on our program through data received from the UWB tags. We would have to think about the scale factor of the camera frame and how much it has been downsampled, and apply the same scale to the UWB reference frame.

To verify my components of the subsystem, I would first test the downsampling algorithm and the A* path-planning module on different environments. I would feed the pipeline around 10-15 bird’s eye view images of indoor spaces. After the segmentation model, I would note if the downsampling does a good job of retaining the required resolution and identifying obstacles and free space. I would empirically measure the occupancy matrix on whether 95% of obstacles are actually still labeled as obstacles. To meet our use case requirement of not bumping into obstacles, I would come up with a metric that shoots for over-detection of obstacles rather than under-detection. Our current downsampling algorithm creates a halo effect around the obstacles, which would meet this requirement. For the communication requirements between the Raspberry Pi (the user’s belt) and the laptop, there is already minimal processing time. The A* path planning algorithm also has minimal processing time, so the user will have real-time updates on which direction to go through haptic sensors that meets our timing requirements. For percentage of navigation to the right location, I would check whether the haptic sensors always vibrate in the right direction out of 10 tries. If there is some calibration required by a compass, that would be taken into account as well.

Charles’s Status Report for 4/12/2025

This week I spent my time working with the Jetson to try to get our image processing code to work on it. This mostly involved a lot of package building and library downloading which ended up being much more involved than I initially thought it was going to be.

We ordered a WiFi adapter for the Jetson that way we could use WiFi instead of Ethernet. This made it so that we could use the Jetson at home and start working on getting all the necessary packages installed.

Installing the packages actually ended up being pretty complex. All the simple libraries like numpy were relatively painless to install. However, all the ML libraries ended up being much more of a hassle. PyTorch and Torchvision both took an extremely long time to install and build. This is because the normal libraries for these that are hosted through pip are not compatible with the Jetson architecture. Using this link, I was able install and build a PyTorch and Torchvision that was compatible with the jetson architecture. This did end up taking multiple hours to build and compile, so I had to leave it running overnight. After finally getting all the necessary libraries downloaded to finally compile our image processing code, we ran into another problem.

As we can see here, we have the 2GB of RAM that our board has, but we also have a 9GB swap file. This 9GB swapfile is essentially virtual RAM for the board because the base amount of RAM is very small. The problem with this is that, even when loading in smaller models, the system runs out of memory on almost any input.

Because the process runs out of memory, it ends up being killed by the OS. Resulting in this message:
[Sat Apr 12 19:12:14 2025] Killed process 10399 (python3) total-vm:9650160kB, anon-rss:0kB, file-rss:537696kB, shmem-rss:0kB

This essentially means that we have completely ran out of memory and thus unable to continue being run.

For the upcoming week, I am going to figure out if there is anything that we can do to run the image processing on the Jetson, and if that’s not possible, if there is any other alternatives.

For verification and validation, I am going to start testing the camera and the image processing in combination. We will do multiple room layouts to test the robustness of the mask generation. We will most likely still do the processing on our laptops because it will be significantly faster, but we will use the camera that we have recently bought.

Kevin’s Status Report for 4/12

This week I received all the DWM1001 UWB anchors. I was able to set them all up to communicate with the tag. I configured the UWB API to read 3 distances simultaneously.

I was able to use the UWB hardware to triangulate to position of the tag on a smaller scale. An image of this setup is attached below, but I was able to successfully triangulate the tag on a table by placing anchors on the corners of the table.

I also helped with the GPIO communication with the haptic vibrators. We were able to trigger vibration motors, and Talay adapted them to our path planning communication.

I think we made good progress this week but our schedule is still tight. For the upcoming week, I think we are on a good track if we can have the entire system integrated; this involves setting up the UWB to be calibrated with the occupancy matrix in the full room. We also need to consider how to set this system up for demo day; I am thinking to buy some tall stands, and use them to hang our overhead camera and install our UWB anchors.

For verification, I want to measure the precision between the sensors. What is interesting is that the absolute distance doesn’t really matter since our distances are arbitrarily scaled. But we can verify that the measurements are consistent across each anchor. Then, we should verify the accuracy of the actual localization system by comparing the actual vs measured location.

Kevin’s Status Report for 3/29

This week, I made progress on both the UWB and the occupancy portions of this project.

For the UWB localization, I received the DWM1001 dev boards, which I successfully set to collect distance metrics from each other, capable of being accessed within a Python program. Since I didn’t want to order all 4 components before testing a pair first, the other UWB anchors are on the way. But the process to set them up should be identical, so I don’t anticipate any roadblocks with this.

Once I have all the distances from the anchors I can get the tag’s coordinates using triangulation.

I also began working on the UI for both calibrating the anchors, as well as setting the target destination. Calibration is done using reverse triangulation:

.

Setting the target destination is as simple as dropping a pin on the map:

 

I also helped with the data processing for the occupancy matrix. Once the map is converted to a pixelated grid, I wanted to add a buffer space around the obstacles. To do this, I implemented a downsizing function which uses higher overlap to achieve a halo effect around each obstacle. The red pixels highlight the difference:

 

I think our project is more on track now, as the individual components are working as expected. Next week, we will have interim demos. I plan on beginning integration. If the remaining UWB anchors arrive, I hope to complete the entire localization pipeline.

Charles’s Status Report for 3/29

This week I spent some more time on the obstacle detection pipeline along with Talay. Talay got a PoC to work with his phone and having his phone communicate back with our laptops. So we tried adding his phone to the pipeline, to do this we just captured a frame from the live feed of his camera and started to run our obstacle detection pipeline on the captured image. I didn’t capture an image of the process or what it looks like, but this snippet of code should give a good idea of what is being done.

Once we got this image to work, we realized that some of the images we were getting from the obstacle detection looked off. This an example picture of a problematic result:

The white in this picture denotes what is a floor while the black denotes an obstacle. You can see pretty clearly here that it is mislabeling the table and the chair here as floor. We figured out that this was just a processing mistake and Talay’s post has the corrected processing. However, in order to fix the issue we had to regress back to a stage of the pipeline where there is no labelling of the floor and obstacles to see what exactly the SAM model is doing.

We can see here that the SAM model is segmenting everything in the room correctly, the only issue is with our labelling algorithm. The solution to the problem ended up being a bug in the processing code.

Next week, I am going to talk with the rest of my group members on a very important point of the project. We need to discuss how the user will select destinations. This can be done with a variety of approaches, most of them requiring at least some sort of UI. We also need to prepare for the demo coming next week.

Team’s Status Report for 3/29

The most significant risk that can jeopardize the success of the project is the overlay between the coordinate plane given by the camera and the coordinate plane given by the UWB sensors. Right now, we have an occupancy matrix of all the obstacles in the indoor environment. We got the basic functionality of the UWB sensors working but to localize the person with the UWB tag in the same frame we would need to overlay the two coordinate planes on top of each other. Since the occupancy matrix is downsampled by an arbitrary scale, we would need to calibrate the UWB sensors so the user’s movements are accurately reflected on the occupancy matrix. We believe this is the next main complexity of our project that we would need to tackle. With 4 UWB sensors (including one on the person), we would be able to localize the person in 3D space. Since our occupancy matrix is 2D, we would need some to use some geometry to match the movements of the person onto the grid. Since the camera used is a 120 degree FOV, we would also need to account for some warping effects on the sides. Once we have all 4 UWB sensors, we can definitely try to test that out.

Another risk that could jeopardize the success of the project is the compute power of the Jetson. Our current software pipeline is already taking a few minutes to run on the laptop, and we still have the UWB positioning left. Once we complete our pipeline, we were planning to move the processing to the Jetson. If the Jetson is unable to process this in a reasonable time, we may have to consider a more powerful Jetson or doing our processing on the laptop.

Some minor changes to the design of the system is that the destination selection will be fully configurable by the user now. When setting up the environment, the user will have the option to configure what locations they would like to request navigation to in the future. This will be set up with a UI that a caregiver can use.

Talay’s Status Report for 3/29

This week, I continued working on the path-finding algorithm and fine-tuning it so that it would be best for our use-case. I decided to switch from a D* path finding algorithm to an A* path finding algorithm so that it does not store any state from iteration to iteration. When integrated with the UWB sensors, we foresee that there might be some warping action with respect to the user’s position. D* path planning algorithm uses heuristics to only calculate certain parts of the path, which relies on the user moving in adjacent cells. Because the user might warp a few cells, A* path finding algorithm is more suitable because it calculates the path from scratch during each iteration. I simulated the user’s position on the grid with mouse clicks and there was minimal latency, so we are going to proceed with the A* path finding algorithm.

Next, I tested the segmentation model on multiple rooms in Hammerschlag Hall. The segmentation itself was quite robust, but our processing to further segment the image into obstacles and free space was not quite robust. I revisited this logic and tweaked a few points so it selected the largest segmented space as free space and everything else as obstacles. Now, we have a fine-grained occupancy matrix that needs to be downsampled.

I worked on the downsampling algorithm with Kevin. We decided to downsample by picking chunks of certain size (n x n) and labeling it as an obstacle or free space (1 x 1 block on the downsampled grid) depending on how many blocks in the chunk were labeled as obstacles. We wanted a conservative estimate on obstacles, so we wanted the obstacles to have a bigger boundary. This way, the blind person wouldn’t walk into the obstacle. We did this by choosing a low voting threshold within each chunk. For example, if 20% or more of the cells within a chunk were labeled as obstacles, the downsampled cell will be labeled as an obstacle. Once we downsampled the original occupancy matrix generated by the segmentation model, we saw that the resolution was still good enough and we got a conservative bound on obstacles.

This is the livestream capture from the phone camera mounted up top.

This is the segmented image.

Here, we do processing on the segmented image to label the black parts (ground) as free space and the white parts as obstacles.

Here is the A* path finding algorithm running on the downsampled occupancy grid. Here, the red is the target and the blue is the user. The program outputs that the next step is top left.

I believe that our progress is on schedule. Next week, I hope to work with the team to create a user interface where they could select the destination. The user should be able to preconfigure which destinations they would like to save in our program so that they can request navigation to that location with a button press. I would also like to try to run the entire software stack on the Jetson to see if the Jetson can process everything with reasonable latency.

Team Status Report for 3/22

The most significant risk that can jeopardize the success of this project is the UWB sensors not working as well as we expected. Our software pipeline is decently robust at this point, however, we are completely relying on the UWB sensors embedded on the user’s belt to determine the user’s location and update it on the D* Path Planning algorithm. The frame captured from the phone camera and the frame calculated from the UWB sensors might also have different dimensions, so the movements of the person may not completely align on the occupancy matrix we run path finding on. This misalignment could cause significant drift as the user moves and makes it difficult to guide him to the target.

These risks are being mitigated by having some of our team members look into libraries that could decrease the fish eye effect on the ends of the image. Since we needed wide lens camera to capture the entire frame, warping on the ends is something that we need to work with. Since our UWB sensors most likely would work uniformly throughout, it would probably be easiest to decrease warping from the CV pipeline.

Another risk is the compass orientation we receive and how we are going to integrate that with the D* path finding algorithm. Since these components take some time to arrive, we can currently work on the software stack right now. However, we are looking into drivers and libraries that could run these hardware components.

There are no changes to our schedule. This week, our main milestones were getting some sort of camera to work and send feed to the computer processing the data. We were also able to select a segmentation model that could classify free spaces and obstacles. Our D* Path Planning algorithm is mostly working, and we will focus on integration next week.

Talay’s Status Report for 3/22

This week, I first tried to find an alternate to the OV2311 Stereo Camera that was not working with the Jetson. The camera had some issues and so we decided to use a phone camera that would be sent to the laptop for processing. We are going to move forward with this solution for the time being so that we can get to MVP and then decide to tweak some of the components later. I was able to capture images from my phone camera’s wide lens view and receive it on my laptop using Python’s OpenCV library. The frame of the indoor environment being stored as an OpenCV video capture is perfect for us to process later.

After this, I tried to experiment with some CV segmentation models that could classify the environment into either obstacles or free space. The models I tried were Segment Anything Model (SAM), Mask2Former Universal Segmentation, DeepLabV3 with COCO weights, and DeepLabV3 with Pascal weights. These models are pre-trained with mostly general data, and so they did not work as well on indoor environments from a top-down bird’s eye view as seen in the following images.

SAM Model With Hamerschlag Hall View:

SAM Model + Mask2Former Universal Segmentation:

DeepLabV3 Model with COCO Weights:

DeepLabV3 Model with Pascal Weights:

As seen from these images, the closest model that is able to at least outline the edges of the furniture is the SAM Model. Charles was able to tune the model such that it fills in the entire furniture and so we were able to get a working model from that. Thus, we are now going to base our segmentation model on the modified SAM Model.

Next, I started working on the D* Lite algorithm. I was able to create a UI that displays the person and the target in a 2D occupancy matrix (one that is similar to the one generated from the modified SAM model). The 2D occupancy matrix contains 0 for free space and -1 for occupied space. The D* algorithm recalculates the shortest path starting from the target to the person and uses heuristics to speed up the algorithm. Currently, I am controlling the person using arrow keys but these will be replaced with UWB sensors after we get that working. The person will also be navigated in the direction of the lowest cost.

My progress is slightly behind schedule as we were not able to get the OV2311 camera working. However, we have decided to use the phone camera as an alternative so we can move forward with our project. Once we have a working model for all the parts, we can consider switching back to the stereo camera instead of a phone camera. With the phone camera working, we are able to make significant progress on the software end of our project this week, so we expect similar progress going forward.

Next week, I hope to integrate the phone camera, SAM CV Model, and D* Lite Path Planning algorithm all into the same pipeline. Currently, these are all working as lone components, but we have to ensure that the output of one stack can feed into the input of the other. Once I am able to integrate all these components, our software stack will be mostly done and we could plug in hardware components as we move forward.