Jerry’s Status Update for 03/07 (Week 4)

Progress

This week, I explored machine learning methods for gesture recognition. We originally used heuristics in order to recognize gestures, but found that heuristics often fail when the user is not standing directly facing the camera or standing on the left or right edge of the camera view.  For example, an arm extended straight forward on the right edge of the screen has a large difference between shoulder X and wrist X compared to an arm straight forward on the center of the screen.

In order to collect more training and testing data, I improved our data collection pipeline to collect features by finding the x and y distances from each keypoint to the chest and normalizing the distances for a 22 feature vector.  The system has a warm up time of 200 frames before it starts recording data and then records every 5th frame of the keypoints as training data.

Because the amount of data we were using were low (~1000 examples), I chose to use SVM (support vector machine) models because of the promise they were converge with low amount of training data.  I built infrastructure to support multi class SVMs to take in a config file of hyper parameters and feature names, train multiple models, and measure the average performance. Online research showed that multiclass SVMs were not reliable, so I also implemented infrastructure for training and running one versus all (OVR) SVM models.

Because the model only recognizes teleop commands, the model is only used when heuristics detect that an arm is raised horizontally. I first tried using a multi class SVM to detect if the keypoint vector was left, straight, or right.

I tried the following hyperparameter combinations, with each row representing one run.

The SVM multiclass model worked great for the teleop commands, with an average test accuracy of 0.9573 over 5 runs. The best parameters were polynomial kernel and regularization constant C = 10. The regularization constant is inversely proportional to the squared l2 penality. After testing with the live gestures, it worked reasonably well and was able to handle many edge cases of not directly facing the camera and standing on the edge of the room.

I tested the OVR SVM method on teleop commands, building a SVM for detecting left, straight, and right. The test accuracy and mean average precision is measured over an average of 5 runs. I saw slightly higher accuracy using a polynomial kernel using C = 100 with a test accuracy of 0.961. It was interesting to see a dropoff in performance for the straight data. there may have been some errors in collecting that data so I may try to collect more.

SVMs worked well for teleop data, so I wanted to try it for pointing data as well. The idea is to divide the room into bins of 1ft by 1ft and use a model to predict which bin a user points to based on the user’s keypoints. I didn’t have enough time to collect data for the entire room, so I only collected data for a row of the room. Here are the results of experiments I tried:

The results for point were also pretty good at 0.95 test accuracy, but the spatial data is only along one dimension, it would be interesting to see the performance across two dimensions. It is also interesting that the performance between OVR and multiclass is more noticeable with more classes, as it’s harder for one SVM to optimize many decision boundaries.

Also worked on documentation for how OpenPose / Xavier board / Roomba communicate.

Deliverables next week

I want to try to do point detection with more than one row of the room. I’m not sure if SVM methods will scale to cover the entire room or another neural network model is needed.

Higher quality data is also needed, I will continue to gather data. I want to explore other feature engineering techniques like normalizing the points but also letting the model know where the user is in the frame. I also want to experiment with using keypoints from multiple camera perspectives to classify gestures.

Schedule

On Schedule. More time needs to be allocated to gesture recognition, but it is fine because we need to wait for the mapping to be finalized to proceed.

Sean’s Status Update for 03/07 (Week 4)

Progress

This week, I worked on implementing the mapping algorithm. Initially, I thought it would be an easy task given that 2D mapping simply implies scanning the room autonomously. However, it turns out it requires much more fine-tuning then that. There are 2 main phases to my mapping algorithm: edge-following and filling-in-the-rest.

Edge-Following

First, the robot performs “edge-following” to get the dimension of the room. The robot will first move forward until it detects an wall. Then, it rotate to the right(or left) in place until the wall is no longer visible. Once this is done, the robot will move forward in an arc steering to the left(or right). The reason for moving in an arc is to lead the robot toward the wall. If it moved straight instead, the robot will potentially travel away from the wall. The edge-following is “done” when the robot returns to its initial position where it detected the first wall. This task is done twice, once steering to the left and once to the right. Performing edge-following in both directions is necessary to scan the area that was potentially missed in one of the runs. Meanwhile, the XY-coordinate of the robot is being recorded to generate the map.

Filling in the Rest

Once the boundary of the room is set, the robot must travel the rest of the room to find any potential obstacles such as a furniture. This is a bit more tricky. First, it is hard to define the end-condition that guarantees enough samplings to generate a map. In addition, it is difficult to set the behavior of the robot that would work universally regardless of the shape of the room. It turns out that iRobot actually programmed Roomba to move “randomly”–move straight ahead, rotate random amount when bumped into something, and loop–to maximize the area being covered by it while cleaning. This works well if the the main function of the robot is to clean the room; there is essentially no limit on the time it takes, and it doesn’t matter too much even if some area is not inspected. However, when we need to generate a 2D map of the room, this can cause some problems. First, moving for a long time, especially including rotations, can introduce more and more error to the odometry. In order to use the map for path-planning, it is important to have a map that is as accurate and detailed as possible. Also, this algorithm doesn’t guarantee a completeness within a finite time. It might be the case that the robot cannot cover enough area within a reasonable time period. Thus, I decide to implement a more formulized behavior for the robot. I am defining the robot to essentially first move back-and-forth parallel to the first wall it detects. This will let the robot travel the room more efficiently. Then, it will do the same thing perpendicular to the first wall. This is to avoid potentially being trapped in a certain area of the room. More testing would be necessary to check the validity of this algorithm.

Deliverables next week

When I get back from the break, I plan to complete the 2D mapping algorithm

Schedule

On Schedule.

Team Status Update for 02/29 (Week 3)

Progress

We finished the first version of our MVP! We integrated all our systems together for teleop control and docking to the charging point. It was great to see all the components come together. The gesture recognition is still shaky and the home docking system needs to use our mapping system, but we will work on that next.

Software

We experimented with different cameras and are working on pipelines for data collection to train ML models to recognize gestures from keypoints.  In addition, the sockets we used for our webserver are unstable, so we did work to make sure that crashes were greatly reduced.

Hardware

It was good to see the hardware components coming together. We were able to control Roomba via headless RPi which was our MVP. Additionally, we began building the mapping algorithm for the robot. It would require a decent amount of testing and fixing, but we hope to finish it by spring break. 2D mapping is essential for the robot’s additional tasks, so we have to make sure the algorithm works correctly before starting to use the generated map.

Deliverables next week

Still a few fixes to finish our MVP and we want to start using a 2D mapping system from the Roomba.

Rama’s Status Update for 02/29 (Week 3)

Progress

I worked on the WebSocket communication between the gestures and the robot. There were some issues with the gestures crashing when the connection dropped, and I wrote a connection wrapper in python for the gestures and robot to use that should attempt to maintain a long-term connection. I’m also changing the format of messages passed to JSON to set up groundwork for passing more structured data around (i.e. mapping information).

Deliverables next week

1. Testing the WebSocket again to make sure there are no crashes

2. Planning WebSocket API spec

3. Starting working on the dashboard

Schedule

On Schedule.

Jerry’s Status Update for 02/29 (Week 3)

Progress

I started this week by finishing up the design presentation. I hooked up the camera system, image processing system, webserver, and robot to finish the first version of our MVP. We are now able to control the robot using teleop and also gesture for the robot to go home.

However, there are some issues with recognizing gestures using heuristics when the user is not facing the camera directly. So, I have built a data collection pipeline to gather training data to train a SVM classifier for if a teleop gesture means a person is pointing left, straight, or right. The features are normalized distances from each keypoint to the person’s chest, and the labels are the gesture. We started collecting around 350 datapoints, and will test models next week.

There are also some issues with the Roomba built in docking command, so we will have to overwrite its command to use our mapping to navigate close to home before activating the Roomba command that slowly connects to the charging port.

In addition,  I installed our second camera. Processing two camera streams halves our FPS, since we have to run OpenPose on twice the frames. So, we are rethinking using multiple cameras and seeing what we can do with two cameras. Ideally we can still have a side camera so there is no one angle where you can gesture perpendicular to all the cameras in the room. I will have to keep experimenting with different angles.

I also have been working on the design report.

Deliverables next week

1. First pass of the gesture classification models

2. Finalize camera positioning in the room

Schedule

On Schedule.

Sean’s Status Update for 02/29 (Week 3)

Progress

This week, I worked on the building a mapping algorithm and improving the Roomba control protocols. For the mapping algorithm, I initially intended to use Roomba’s internal “clean” function to scan the room. However, it turns out that I cannot directly control or get sensor reading once it is in the clean mode. Thus, it was necessary to build our own mapping algorithm. First method I plan to test is to make the robot 1) follow the wall, 2) move back and forth in the area in between. This seems to be the method iRobot is using as well. This would be done autonomously. During this process, it became clear that some kind of threading would be necessary. Since the data packet is being streamed constantly, having a thread dedicated to processing the odometry was necessary. I was able to control the Roomba using threading with my local computer, but haven’t tested on the RPi yet.

Deliverables next week

Next week I would focus on 1) completing the mapping algorithm and 2) making sure the control protocol is robust.

Schedule

On Schedule.

Team Status Update for 02/22 (Week 2)

We all put in a ton of work this week (~60 hrs) towards our MVP. Most of the individual components are completed, and we will integrate our system next week.

OpenPose and Xavier board

We were able to successfully run OpenPose on the Xavier board at 30 FPS after a few optimizations. Additionally, we were able to setup the Xavier board with the CMU network, so we can ssh into our board to work on it anytime.

Gesture Recognition

We were able to start classifying our 3 easiest gestures, left hand up, right hand up, and both hands up with heuristics using images and video examples. We set up the camera we purchased and we are able to classify gestures live at 30FPS. We also developed heuristics for tele-op commands, but are going to try model based approaches next week.

Webserver

After our tests with OpenPose on the board showed that it only used less than 20% CPU, we decided to host our webserver on the Xavier board. This lets us reduce the latency for communication to AWS. We also tested python flask and node.js express servers. We decided on using node.js because it was asynchronous and could handle concurrent requests better. We also tested the latency to the server, and it took around 70ms, so we are still on track for being under our 1.9s response time requirement.

Raspberry Pi

We successfully setup the Raspberry pi for the robot module. Raspbian was installed, which might be helpful programming the robot in the early stage before we move on to the headless protocol. Rama also setup a web server for the RPi so we can control it without a physical connection.

Roomba

We setup the Roomba and are able to communicate with it to pyserial. Sean was able to experiment with the Roomba Open Communication Interface to send motor commands and read sensor values.

Design Review Presentation

We all worked on the design review presentation next week. The presentation made us discuss our solution approaches to different problems. Now that we have tested many of our individual components, we had a much better perspective on how to tackle many of our problems.

Drive-to-Point

It is still early to think about this functionality since it is beyond our MVP, but we spent some time this week discussing about potential solution approaches to it. We will try out both methods and determine which is the right way to proceed.

    • Method 1: Using multiple images from different perspectives to draw a line from the user’s arm to the ground.
      • Using trig with angles from core to arm, arm to shoulder, and feet position to determine ground.
    • Method 2: Using neural network classifier to predict the position in the room using keypoints as input
      • Collect training data of keypoints and proper bin
      • Treat every 1ft x 1ft square in the room as a bin in a grid
      • Regression model to determine x and y coordinate in the grid

Deliverables next week

We are on track to complete our MVP and we hope to finish our tele-op next week!

Rama’s Status Update for 02/22 (Week 2)

Progress

We got OpenPose running on the Xavier board with the USB camera, and got a static IP for the Xavier board so we can develop on it remotely as needed. I started work on the webserver that the Xavier board will be running to bridge the gap between the gesture recognition and Roomba control. We also got a lot of the specifics of implementations regarding gestures nailed down.

Jerry’s Status Update for 02/22 (Week 2)

Progress

Rama and I got OpenPose running on the Xavier board this week.  This allowed me to start playing around with the results OpenPose provides.

Gestures:

After first running OpenPose on images we took manually and videos, I was able to classify our 3 easiest gestures with heuristics: left hand up, right hand up, and both hands up. After we got our USB webcam, installed the webcam, and got the camera to work with the Xavier board (required a reinstall of OpenCV), I started working on classifying teleop drive gestures (right hand point forward, left, and right). I implemented a base version using heuristics, but the results can be iffy if the user is not standing directly at the camera. I hope to try to collect some data next week and build a SVM classifier for these more complex gestures.

OpenPose Performance on Xavier board:

Rama and I worked on optimizing the performance of the OpenPose on the Xavier board. With the default configuration, we were getting around 5 FPS. We were able to activate 4 more CPU cores on the board that brought us to 10 FPS. We also found an experimental feature for tracking that limited the number of people to 1, but brought us to 30 FPS. (0.033s per image) We were anticipating OpenPose to take the longest amount of time, but optimizations made it much faster than we expected. It should bring us below our time bound requirement of 1.9s for a response.

Openpose running with the classified gesture: “Both hands raised”

Deliverables next week

1. Help the team finish teleop control of the robot with gestures. Integrate the gestures with the webserver.

2. Collect data for a model based method of gesture classification.

3. Experiment with models for gestures classification.

Schedule

On schedule to finish the MVP!

Sean’s Status Update for 02/22 (Week 2)

Progress

I had a chance to actually work and play around with Roomba this week. We purchased Roomba 671, which can be controlled using serial communication with a help of their Open Interface.

Understanding Roomba Open Interface:

After inspecting the Open Interface documentation, I was able to figure out the basic serial communication protocol the Roomba expects. There are 3 modes the Rooba can be at: passive, safe, and full mode. Each mode has different limitation to how much control the user has. For instance, the robot is in the passive mode by default. In this mode, we cannot interrupt the movement of the robot (i.e. have control over the motor system). Once we switch to the safe/full mode, we are able to directly drive the robot.

Motor control:

To drive the motor, we send 2-bytes of serial command to the robot. 1 byte for the velocity and 1 byte for the radius of rotation. It is interesting iRobot decided to make their drive method this way instead of more conventional left-right-velocity control.

Issue:

Currently, the sensor reading is unstable. The first issue I encountered was that the sensor reading becomes unreliable when the mode changes. For instance, the robot returns a reasonable encoder value in the safe mode. However, once it switches to another mode (e.g. “Dock” command automatically switches the mode to passive) the encoder value jumps to unexpected value, making the data unusable. I suspected that there is some internal method that resets encoder/sensor reading once the mode changes. So I attempted to work around it by requesting a constant stream of data packets from Roomba. Luckily, this got rid of the first issue. However, I found out that there seems to be some data corruption during the streaming. Sometimes, it returns an unreadable packet with the incorrect header and check-sum byte. I attempted to first ignore the corrupted data, but it seems like they are considerable portion of the returned data. I will look into this problem further in the following week.

Deliverables next week

We were able to accomplish some basic functionalities of the robot. Now, we must integrate the system all together and test it. As of the moment, my goal next week is to:

1. Finish tele-op control
2. RPi headless control of Roomba
3. Integrate Gesture recognition with Roomba control

Schedule

On Schedule.