Jerry’s Status Update for 04/11 (Week 9)

Progress

This week, I trained a few new models on the new dataset for the point. With evaluating validation accuracy on seperate videos, I found that the SVM method gave around 92% accuracy. So, I also tried a simple categorical neural net that gave around 93% validation accuracy. Additionally, I tried a multitask model that predicted the x and y categories individually. This gave the best results, with 97% validation accuracy for the x axis and 94% on the y axis.

Installing tensorflow on the Xavier board took some time, but I have recorded the installation steps for future reference.

I am also collecting the larger point dataset, with a room larger than just 3×3 bins.

Deliverables next week

Next week, I will apply some SVM tuning suggestions Marios gave us. I am also going to train the models on the larger point dataset. I will also work on a trigger for the dataset.

Schedule

On schedule.

Sean’s Status Update for 04/11 (Week 9)

Progress

Midpoint-Demo

This week, I presented the work I’ve done with the robot so far. The presentation included a video of the Roomba mapping the room and snaps of the generated maps. I spent some time recording the video and data so it is clear how the robot maps the room. Overall, I think the presentation went well.

Path-finding

I also spent some time developing the path-finding algorithm for the robot. It was helpful that the map already presents a discretized grid version of the space.  That way, we can fairly easily implement 8-point connectivity representation of the motion of the roomba. I plan to implement the A* (probably weighted A*) algorithm to traverse grids. However, this introduces another concern. The odometry is particularly susceptible to the turning of the roomba. If we use the 8-point connectivity, that means we need to carefully measure the turning angles. Hopefully the error isn’t big enough to accumulate to a significant value, but we will have to see.

Deliverables next week

Next week, I plan to complete the implementation of path-finding and the consequent driving.

Schedule

On schedule.

Team Status Update for 04/04 (Week 8)

Progress

We made good progress on pointing, mapping, and 2D to 3D mapping for our demo this week.

Deliverables next week

Gestures

Pointing is coming along. It worked on a toy dataset of 2700 poses, but now we want to build more robust models and collect a larger dataset to do so.

Mapping

The 2D mapping is finally complete. It generates a grid map (and a txt representation of it) that can be used in other functionalities. This will be demonstrated at the interim-demo in the coming week.

Deliverables next week

We will show demo videos during the midpoint demo.

Schedule

On schedule.

Rama’s Status Update for 04/04 (Week 8)

Progress

I wrote a script to identify the position of a user and the robot in the camera view. The user identification uses OpenPose and it is easy to recognize if the user is not present in the image. The robot identification was inaccurate because of the color of the robot, but after Sean changed the color to red the identification became a lot more accurate. The position overlap of the user and robot needs more work to recognize when they are on the same map coordinates, and the code for the 2D to 3D mapping between image location and map location need more work. It is difficult to line up position in video frames with movement coordinates of the robot, and it might be necessary to modify the data output from the robot encoders.

Deliverables next week

I will fix the overlap inaccuracies and work on a visualization for the demo.

Schedule

On schedule.

Jerry’s Status Update for 04/04 (Week 8)

Progress

This week, I experimented with more methods to do the point. I wanted to get another webcam, but surge pricing due to the pandemic has caused webcams to go from $17 to $100. So, I am exploring solutions of using my phone or laptop webcam.

With pointing to 8 squares in a room, I was able to achieve 98% test accuracy and 98% validation accuracy with a SVM OVR model trained on 2700 poses. However, there were still error running it on a test video. I believe this is because I calculated the validation metrics from a held out percent of frames from the original video for training. So, I  collected separate videos for train and validation. Also, I wanted to collect new data for the point to include pointing with both left and right hands.

The SVMs currently do not relay the fact that bins closer together are more similar, so I hope to try other model architectures like one neural net to predict the x and y coordinate of the bin separately. I could not have done this with the old dataset because there was not enough data.

I also built a visualization for the point to show in the demo.

Deliverables next week

I will train some new models with the dataset that I collected, and hope to get better results for the point on the test videos.

Schedule

On schedule.

Sean’s Status Update for 04/04 (Week 8)

Progress

2D Mapping

This week, I finished developing the 2D mapping algorithm. It is a combination of edge-following and lawn-mowing, which effectively maps every possible position of the robot in the room. The robot records its XY-coordinates throughout the phase and at the end of the exploration, a map is generated based on the record. The algorithm itself isn’t overly complicated, but there were many real-world issues (unstable serial connection, inaccurate encoder values, different light/floor condition, etc. ) which made making the algorithm robust very difficult. As I mentioned before, doing edge-following in both direction is necessary to map any corner within the room. Similarly, it turns out that doing lawn-mowing pattern in both horizontal and vertical direction was also necessary. The total length of the mapping would take about 5 minutes for a small sized room. It was also necessary to carefully iterate through the resulting list of coordinates to generate a useful map. The map is basically a discretized grid representation of the room.  Currently, the cell-size of the map is set to 5cm. i.e., one cell in the map would represent 5cmx5cm square area of the room. If the robot was able to travel through the cell multiple times, it is safe to conclude the cell is safe, especially with small enough cell size. In addition, if an unexplored cell which is within the range(the radius of the robot) of two or more already explored safe cells, it is reasonable to conclude the unexplored cell is safe as well.

Driving to Home

I also implemented a simple version of driving-to-home function. It will record the “home” location at the beginning of the mapping. This location is different from the actual charging station. To utilize the “dock” functionality of the roomba, it has to be a certain distance straight away from the charging station; or else, the IR sensor will fail to locate the charging station, and the robot will begin travelling in a wrong direction. Thus, the robot will initially move straight back for a certain period of time and record that position as the “home,” so it can try to dock afterwards. Currently, the robot will simply turn to the “home” location and drive straight until it thinks it is within an tolerable error range with the goal. Obviously, it doesn’t take account for the map of the room or any obstacle it might face on the path, so I will improve this functionality in the coming week.

Deliverables next week

Next week, I will improve the driving-to-home functionality and implement a path-finding algorithm that can be used in the driving-to-user functionality.

Schedule

On schedule.

Team Status Update for 03/28 (Week 7)

Progress

Gesture Recognition

We made good progress, making sure everything we had before still works with the remote setup. We transitioned into using videos instead of camera streams, optimized the output stream of gestures, and deployed SVM methods of gesture recognition.

In addition, we experimented with different ways to recognize the point.

Risk Management Plan

Most of our risks of the parts not arriving have been resolved, as they arrived this week.

For gesture recognition, being able to work on videos removes our risk for not having enough time to run everything remotely. The major risks left for gesture recognition is not being able to detect the point, but that can potentially be resolved with additional hardware (another camera perspective) and limiting the point problems by the size of the bins we choose.

————————–

Our risk management is pretty much the same; only difference is that we will have to work individually on each risk management.

(From the design proposal doc: )

The largest risk for our project is localization of the robot. Our tasks of going back to home, going to the user, and going to the pointed location all require the robot to know where it is on the map. We are trying to mitigate the risk by using multiple methods to localize the robot. We are using data from our motor encoders to know where the robot has traveled on the 2D map. Additionally, we are going to use the camera view and our camera 3D to 2D mapping in order to get a location of the robot in the room. By having two methods to localize the robot, we can maximize the chances of localization.

Classifying the gestures incorrectly is also a risk. OpenPose can give us incorrect keypoints, which would cause an error in the classificaiton process. To address this, we are using multiple cameras to capture the user from multiple angles in the room. So, we have backup views of the user to classify gestures. Running OpenPose on more cameras decreases our total FPS, so our system can only have at most 3 cameras. We chose 3 cameras to balance the performance of our system and the cost of the hardware required with the accuracy we can get in gesture classification. in addition, we will have backup heuristics to classify the gestures if our system can not confidently recognize a gesture.

 

Schedule

Gantt chart for the revised schedule:

Rama’s Status Update for 03/28 (Week 7)

Progress

I got the Xavier board delivery this week, and set it up to work on my home network and also allow Jerry remote access to it. We don’t have a static IP but as long as our router stays on the IP shouldn’t change and nothing will break. I started work on using OpenCV to recognize the Roomba because we have OpenPose to recognize user position already and there is still some more setup that needs to be done before I can get videos of the Roomba moving around the room for the mapping to begin.

Deliverables next week

I will finish the Roomba recognition and connect it with the user recognition in preparation for mapping.

Schedule

On schedule.

Jerry’s Status Update for 03/28 (Week 7)

Progress

I finally received my parts and can access the xavier board via ssh now.

Streaming

I wanted to first try streaming video to the xavier board from my local webcam for the hopes that we can run in live time. I encoded the image into bytes and sent them via websockets between the server and the computer. However, this results in a super slow 2FPS. So, I decided it would be best to use videos to train and test the gesture recognition. RIP live time.

Setting up remote environment

Without streaming, we had to work with videos. I changed the OpenPose scripts (gesture recognition and feature extraction) to use videos instead of the webcam.

To get videos, I wrote a python script to record videos with the webcam and OpenCV, with features like automatic recording after a wait time and stopping recording after a certain length was reached. This was helpful for gathering data later on and I shared the script with the rest of the team.

I  built a bash script so I could send videos to the server, execute code remotely, and copy the results back to my local computer. In addition, I set up the webcam in my room. I tested the existing gestures (to me, go home, stop, and teleop commands) in my room and they worked great. It was nice seeing that the teleop gesture data I collected at CMU generalized to both  my room and the video recorded in Sean’s room.

Gesture recognition

I cleaned up alot of our gesture recognition logic to make it most understandable, as we were using multiple methods of detection (model and heuristics). Also, running gesture recognition on video showed some errors in the gesture stream output. The gesture stream output should only return a gesture if the gesture is changed. However, there is noise with running OpenPose that return bad results for a frame, causing gesture recognition to think a gesture changed. For example, if an arm is raised for 4 frames, there may be noise on frame 3. So the gesture stream may be [arm raised at frame 1, no gesture at frame 3, arm raised at frame 4]. However, we only want one arm raised output in the action, or else the gesture will trigger on the robot twice. To resolve this, we use a gesture buffer. Commands like “no gesture” or teleop gestures require to be seen for 10 frames (1/3 sec) before they count as a detected gestures. Gestures like left arm raised, right arm raised, and both arms raised are detected immediately, but only if the previous gesture detected was no gesture. This helps remove noise between transitions of gestures and give us the desired output of the gesture output stream.

One vs all (OVR) SVM methods

As mentioned in the post on week 4, I did work earlier to setup both multiclass and OVR SVM methods. I found that they did not make much of a difference between teleop gesture classification (3 classes) but improved the model on testing point data (pointing along 1 horizontal row of the room with 6 classes). I added the ability to run both OVR and multiclass models in gesture recognition. Also, I experimented with the class weights parameter in SVM training to prevent too much of an imbalance of positive and negative data.

Teleop SVM methods beating out heuristics for recognizing straight teleop gesture.

Point

I also wanted to do more work on gesture recognition for detecting pointing in a room. Previously, we wanted predict which bin in a room a user was pointing to, without taking into account the position of the user. However, this is hard to generalize to a different room, so I wanted to explore predicting pointing to a bin relative to the user. So, I collected data pointing to 7 bins to the left, right, and front of the user. On early datasets this method achieved around 0.9 mAP and 0.975 test accuracy, (with test set being randomly sampled from the same video as training data)  but it still is iffy on the test videos.  I want to have automated evaluation the system on different videos and collect more training data. The system can easily detect if a point is to the left, right, or center (x dimension), but has trouble seeing how in front a user is pointing to (y dimension). This is because the data for how in front you are pointing to is very similar from a front view camera. This could potentially be solved with another side camera.

Early point data, detecting i’m pointing to a bin to the 2 ft right and 2 ft in front of me.

Deliverables next week

I want to continue to gather data for the point problem and work on different models. Additionally, I want to have separate videos to train and test on. I also want to work with the side camera data if it can arrive next week.

I also want to start on the visualization for the point recognition.

Risk management

Most of our risks of the parts not arriving have been resolved, as they arrived this week. Also, being able to work on videos removes our risk for not having enough time to run everything remotely.

The major risks left is not being able to detect the point, but that can potentially be resolved with additional hardware (another camera perspective) and limiting the point problems by the size of the bins we choose.

Schedule

On schedule. Slightly ahead of the previous planned schedule as the parts arrived this week. However, more time needs to be spent on the point. I have attached the updated Gant Chart.

Sean’s Status Update for 03/28 (Week 7)

Progress

I finally got the Roomba as well as the webcam and connecting cables delivered to me.

Env-Setup & Data collection

I set up the environment in my basement, with the webcam covering the corner of the room. I tested basic manipulations (driving, rotating, etc.) and realized the odometry is a bit off. I suspect the carpet floor to be the cause. It is still within the requirement (1-ft error after 10m run) but I might need to add some fine tuning to improve the accuracy. I also was able to collect some video data for other teammates to use.

Mapping

Edge following part of the mapping is working fine (with some fine tuning tweaks due to the new environment). Initially, I planned to use multi-goal A* algorithm for mapping. However, it turns out it is unnecessarily complicated for a limited indoor environment. So I am pivoting to lawn-mowing pattern, which works well in a confined space with a few obstacles. The mapping will be done in the early next week.

Deliverables next week

I will complete the 2-D mapping with some limited implementation of path-finding.

Schedule

On schedule.