Kevin’s Status Report for 3/29

This week, I made progress on both the UWB and the occupancy portions of this project.

For the UWB localization, I received the DWM1001 dev boards, which I successfully set to collect distance metrics from each other, capable of being accessed within a Python program. Since I didn’t want to order all 4 components before testing a pair first, the other UWB anchors are on the way. But the process to set them up should be identical, so I don’t anticipate any roadblocks with this.

Once I have all the distances from the anchors I can get the tag’s coordinates using triangulation.

I also began working on the UI for both calibrating the anchors, as well as setting the target destination. Calibration is done using reverse triangulation:

.

Setting the target destination is as simple as dropping a pin on the map:

 

I also helped with the data processing for the occupancy matrix. Once the map is converted to a pixelated grid, I wanted to add a buffer space around the obstacles. To do this, I implemented a downsizing function which uses higher overlap to achieve a halo effect around each obstacle. The red pixels highlight the difference:

 

I think our project is more on track now, as the individual components are working as expected. Next week, we will have interim demos. I plan on beginning integration. If the remaining UWB anchors arrive, I hope to complete the entire localization pipeline.

Charles’s Status Report for 3/29

This week I spent some more time on the obstacle detection pipeline along with Talay. Talay got a PoC to work with his phone and having his phone communicate back with our laptops. So we tried adding his phone to the pipeline, to do this we just captured a frame from the live feed of his camera and started to run our obstacle detection pipeline on the captured image. I didn’t capture an image of the process or what it looks like, but this snippet of code should give a good idea of what is being done.

Once we got this image to work, we realized that some of the images we were getting from the obstacle detection looked off. This an example picture of a problematic result:

The white in this picture denotes what is a floor while the black denotes an obstacle. You can see pretty clearly here that it is mislabeling the table and the chair here as floor. We figured out that this was just a processing mistake and Talay’s post has the corrected processing. However, in order to fix the issue we had to regress back to a stage of the pipeline where there is no labelling of the floor and obstacles to see what exactly the SAM model is doing.

We can see here that the SAM model is segmenting everything in the room correctly, the only issue is with our labelling algorithm. The solution to the problem ended up being a bug in the processing code.

Next week, I am going to talk with the rest of my group members on a very important point of the project. We need to discuss how the user will select destinations. This can be done with a variety of approaches, most of them requiring at least some sort of UI. We also need to prepare for the demo coming next week.

Team’s Status Report for 3/29

The most significant risk that can jeopardize the success of the project is the overlay between the coordinate plane given by the camera and the coordinate plane given by the UWB sensors. Right now, we have an occupancy matrix of all the obstacles in the indoor environment. We got the basic functionality of the UWB sensors working but to localize the person with the UWB tag in the same frame we would need to overlay the two coordinate planes on top of each other. Since the occupancy matrix is downsampled by an arbitrary scale, we would need to calibrate the UWB sensors so the user’s movements are accurately reflected on the occupancy matrix. We believe this is the next main complexity of our project that we would need to tackle. With 4 UWB sensors (including one on the person), we would be able to localize the person in 3D space. Since our occupancy matrix is 2D, we would need some to use some geometry to match the movements of the person onto the grid. Since the camera used is a 120 degree FOV, we would also need to account for some warping effects on the sides. Once we have all 4 UWB sensors, we can definitely try to test that out.

Another risk that could jeopardize the success of the project is the compute power of the Jetson. Our current software pipeline is already taking a few minutes to run on the laptop, and we still have the UWB positioning left. Once we complete our pipeline, we were planning to move the processing to the Jetson. If the Jetson is unable to process this in a reasonable time, we may have to consider a more powerful Jetson or doing our processing on the laptop.

Some minor changes to the design of the system is that the destination selection will be fully configurable by the user now. When setting up the environment, the user will have the option to configure what locations they would like to request navigation to in the future. This will be set up with a UI that a caregiver can use.

Talay’s Status Report for 3/29

This week, I continued working on the path-finding algorithm and fine-tuning it so that it would be best for our use-case. I decided to switch from a D* path finding algorithm to an A* path finding algorithm so that it does not store any state from iteration to iteration. When integrated with the UWB sensors, we foresee that there might be some warping action with respect to the user’s position. D* path planning algorithm uses heuristics to only calculate certain parts of the path, which relies on the user moving in adjacent cells. Because the user might warp a few cells, A* path finding algorithm is more suitable because it calculates the path from scratch during each iteration. I simulated the user’s position on the grid with mouse clicks and there was minimal latency, so we are going to proceed with the A* path finding algorithm.

Next, I tested the segmentation model on multiple rooms in Hammerschlag Hall. The segmentation itself was quite robust, but our processing to further segment the image into obstacles and free space was not quite robust. I revisited this logic and tweaked a few points so it selected the largest segmented space as free space and everything else as obstacles. Now, we have a fine-grained occupancy matrix that needs to be downsampled.

I worked on the downsampling algorithm with Kevin. We decided to downsample by picking chunks of certain size (n x n) and labeling it as an obstacle or free space (1 x 1 block on the downsampled grid) depending on how many blocks in the chunk were labeled as obstacles. We wanted a conservative estimate on obstacles, so we wanted the obstacles to have a bigger boundary. This way, the blind person wouldn’t walk into the obstacle. We did this by choosing a low voting threshold within each chunk. For example, if 20% or more of the cells within a chunk were labeled as obstacles, the downsampled cell will be labeled as an obstacle. Once we downsampled the original occupancy matrix generated by the segmentation model, we saw that the resolution was still good enough and we got a conservative bound on obstacles.

This is the livestream capture from the phone camera mounted up top.

This is the segmented image.

Here, we do processing on the segmented image to label the black parts (ground) as free space and the white parts as obstacles.

Here is the A* path finding algorithm running on the downsampled occupancy grid. Here, the red is the target and the blue is the user. The program outputs that the next step is top left.

I believe that our progress is on schedule. Next week, I hope to work with the team to create a user interface where they could select the destination. The user should be able to preconfigure which destinations they would like to save in our program so that they can request navigation to that location with a button press. I would also like to try to run the entire software stack on the Jetson to see if the Jetson can process everything with reasonable latency.

Team Status Report for 3/22

The most significant risk that can jeopardize the success of this project is the UWB sensors not working as well as we expected. Our software pipeline is decently robust at this point, however, we are completely relying on the UWB sensors embedded on the user’s belt to determine the user’s location and update it on the D* Path Planning algorithm. The frame captured from the phone camera and the frame calculated from the UWB sensors might also have different dimensions, so the movements of the person may not completely align on the occupancy matrix we run path finding on. This misalignment could cause significant drift as the user moves and makes it difficult to guide him to the target.

These risks are being mitigated by having some of our team members look into libraries that could decrease the fish eye effect on the ends of the image. Since we needed wide lens camera to capture the entire frame, warping on the ends is something that we need to work with. Since our UWB sensors most likely would work uniformly throughout, it would probably be easiest to decrease warping from the CV pipeline.

Another risk is the compass orientation we receive and how we are going to integrate that with the D* path finding algorithm. Since these components take some time to arrive, we can currently work on the software stack right now. However, we are looking into drivers and libraries that could run these hardware components.

There are no changes to our schedule. This week, our main milestones were getting some sort of camera to work and send feed to the computer processing the data. We were also able to select a segmentation model that could classify free spaces and obstacles. Our D* Path Planning algorithm is mostly working, and we will focus on integration next week.

Talay’s Status Report for 3/22

This week, I first tried to find an alternate to the OV2311 Stereo Camera that was not working with the Jetson. The camera had some issues and so we decided to use a phone camera that would be sent to the laptop for processing. We are going to move forward with this solution for the time being so that we can get to MVP and then decide to tweak some of the components later. I was able to capture images from my phone camera’s wide lens view and receive it on my laptop using Python’s OpenCV library. The frame of the indoor environment being stored as an OpenCV video capture is perfect for us to process later.

After this, I tried to experiment with some CV segmentation models that could classify the environment into either obstacles or free space. The models I tried were Segment Anything Model (SAM), Mask2Former Universal Segmentation, DeepLabV3 with COCO weights, and DeepLabV3 with Pascal weights. These models are pre-trained with mostly general data, and so they did not work as well on indoor environments from a top-down bird’s eye view as seen in the following images.

SAM Model With Hamerschlag Hall View:

SAM Model + Mask2Former Universal Segmentation:

DeepLabV3 Model with COCO Weights:

DeepLabV3 Model with Pascal Weights:

As seen from these images, the closest model that is able to at least outline the edges of the furniture is the SAM Model. Charles was able to tune the model such that it fills in the entire furniture and so we were able to get a working model from that. Thus, we are now going to base our segmentation model on the modified SAM Model.

Next, I started working on the D* Lite algorithm. I was able to create a UI that displays the person and the target in a 2D occupancy matrix (one that is similar to the one generated from the modified SAM model). The 2D occupancy matrix contains 0 for free space and -1 for occupied space. The D* algorithm recalculates the shortest path starting from the target to the person and uses heuristics to speed up the algorithm. Currently, I am controlling the person using arrow keys but these will be replaced with UWB sensors after we get that working. The person will also be navigated in the direction of the lowest cost.

My progress is slightly behind schedule as we were not able to get the OV2311 camera working. However, we have decided to use the phone camera as an alternative so we can move forward with our project. Once we have a working model for all the parts, we can consider switching back to the stereo camera instead of a phone camera. With the phone camera working, we are able to make significant progress on the software end of our project this week, so we expect similar progress going forward.

Next week, I hope to integrate the phone camera, SAM CV Model, and D* Lite Path Planning algorithm all into the same pipeline. Currently, these are all working as lone components, but we have to ensure that the output of one stack can feed into the input of the other. Once I am able to integrate all these components, our software stack will be mostly done and we could plug in hardware components as we move forward.

Kevin’s Status Report for 3/22

This week we ran into the roadblock of being unable to capture camera data from our Jetson. Our team decided to have some workable data by temporarily pivoting to using a phone camera to capture the bird’s eye data.

We are currently still waiting on the UWB hardware to arrive,  so we all shifted focus to capturing a usable map from the bird’s eye view camera, as that  is something that all the other components would rely on, and we were behind schedule on both capturing the image and generating a CV model of the space. While my teammates captured data from their houses, I was also able to simultaneously capture data from the HH lab by placing the phone up on a ceiling light.

We were also trying to find an appropriate CV model for this task. With a segmentation model which only produced a border outline of the map, I tried to fill in the obstacles so that the inside of the border was outlined as well:

This was unsuccessful as I struggled to correctly identify the floor compared to actual obstacles. Charles was able to use the Meta SAM model which accomplished this and we will likely proceed with that model.

I also played around with using CV to remove the fisheye effect. This will require more tuning.

 

I think we are behind schedule, but it is manageable. I would like to be able overlay the path planning on top of our occupancy matrix next week. I also hope that if the UWB sensors arrive next week, that we can set that positioning system up.

Charles’s Status Report for 3/22

This week I spent a lot of time on the CV part of our project. The first challenge of this week was finding a model that could effectively recognize objects. This immediately came with some shortcomings, a lot of the object recognition models are not trained on images that are taken from a top-down POV. This lead to some very questionable predictions and a lot of times omitted larger portions of the image. I’m still not really sure why parts of the image were not inferred on or if they was just not guess for an object. The picture below shows one of the inferences:

There wasn’t that much of a solution to this problem as other models also gave questionable results. I then moved onto a segmentation only model which lowered the model’s complexity and was only used to segment the image. This ended up working much better in terms of pure obstacle detection than the YOLO model. I used the Segment Anything Model from Facebook and it returns masks of segmented objects. I then heuristically chose the largest continuous segmented object to be the floor as this is usually the case. The result of this code is below:

In this image, the green is considered the floor, and all the red is considered an obstacle. As we can see from the image, the results are actually quite accurate. I also converted this image into a occupancy matrix, preparing the data for pathfinding afterwards.

Assuming that this heuristic approach will work for our future rooms, I have started to move towards pathplanning/finding. I am working together with Talay to hopefully implement a working A*/D* algorithm. We hope to first simulate the algo with a controllable character and a prompt telling us the intended route to the target.

Charles’s Status Report for 3/15

This week I spent most of my time working on the CV component of the product. I found a very rudimentary way to locate obstacles and that was through Canny edge detection. There are a lot of different factors for preprocessing the image before it undergoes the algorithm that can skew a lot of what is considered an edge or not. I followed some online resources that gave a very rudimentary look at getting edge detection to work though OpenCV python. This is what some of my preliminary results look like.

Original Image:

Edge Detection:

Here its pretty clear that the algorithm is effectively doing edge detection but it now becomes kind of hard to determine what is a relevant edge and what is not. it does a good job of marking the plant, lamp, sofa, and tables respectively however the rug is still considered an obstacle.

For the next week, I want to see if there is any more accurate edge detection models, as this one has some minor flaws in detection, and seeing if combining the results of edge detection with something like YOLO might make it easier to differentiate between something like a rug and a legitimate obstacle like a sofa.

Team Status Report for 3/15/2025

The most significant risks that can jeopardize the success of this project is the hardware stack. Since the Jetson is a budget computer, it has many compatibility issues with modules that were even created to be used with it. For example, the WaveShare IMX219-83 Stereo Camera was created to be used with the Jetson, but just for it to be detected on the Jetson I had to download multiple drivers (until I found the right spec) and configured port settings on the Jetson that were not explained on most tutorials. Since many of the hardware components are not widely used, there is a very small community of engineers who have used these products and could share their ideas. Thus, the tutorials online don’t usually capture the complexity of setting this up. Setting up the Jetson and the camera modules were supposed to be done within a few weeks, but we are already taking longer than that (due to Jetson being broken and now camera compatibility issues). We are managing these risks by having Charles focus on the software stack in parallel while Kevin and Talay make sure the hardware stack is ready. We are also using widely available tools for the software stack so that our programs are easy to debug and there is a programming community out there.

No changes were made to the existing design. One small change is that we are going to utilize more UWB sensors that require less setup as described in Kevin’s status report.