Team Status Report for 11/13

This week, the team finished a successful Interim Demo! We were able to show off a large chunk of our system’s functionality, namely mouse movement and left mouse clicking/dragging, and received great feedback and suggestions with which to proceed. With this deadline out of the way and looking forward towards the final presentation and final report, the biggest risk in our path is dealing with testing and verification. Now that we have a working system, we can start with gathering testing metrics and preparing user stories for future testing.

Our system has no design changes although we are all now making new small changes to improve the smoothness and functionality of the system. We did however discover a bug in our schedule thanks to the professors! Here is an updated and fixed version of our schedule.

Schedule

Brian Lane’s Status Update for 11/13

We had interim demos this week. My group demoed the functional portions of our project to course TAs on Monday and Professors on Wednesday, are received positive feedback, as well as suggestions for UI to add and potential training and dataset augmentations.

Personally, I spent this week improving the gesture recognition model, adding robustness to the predictions made by the model by introducing random rotations to the training data. This is done by initializing a random angle theta and constructing a 2×2 rotation matrix

[ cos theta, –sin theta ]
[ sin theta,    cos theta]

That is then multiplied by the 2×21 matrix containing the landmark coordinates. These transforms resulted in much better accuracy recognizing the click gesture when user hands were not directly vertical.

I will spend next week further refining the model. Our current implementation is very accurate classifying open hands and closed fists, but is still struggling some with hand representations of number and the other and signs.

 

 

Andrew’s Status Report 11/16

This week, I worked more on polishing up the code for the pose estimation. We began integration this week and met several times to discuss the demo as well as piece together the system. I’m working on some sort of smoothing algorithm for the on screen position estimation as right now when we’re updating mouse location with the pose data we’re getting slightly noisy mouse movement when we’re trying to be precise. While we’re still able to click on a unit of smallest pixel frame with relative ease, a big part of our project as we’ve stressed is user experience, and thus, getting smoother cursor movement should be somewhat of a priority. I’m currently thinking about doing an average of pixel motion. Similar to how periodic averaging smooths a signal and acts as a high pass filter, I’m going to test if the same applies here. Since our camera refresh rate is pretty high, it’s safe to assume we have some noise in our hand detection in a stationary position, so we’ll see if this averaging smooths that mild sporacity out. To stress again, this module isn’t a top priority right now as we’re able to perform within our initial specs, but it would be nice to have. Right now, integration of the system takes top priority. I’m on schedule with everything else I have outlined in the gantt chart.

Team Status Report for 11/6

This week, the whole team continued to work on the deliverables that we plan to show during the Interim Demo. There was more collaboration and discussion between team members this week as we started to integrate our components together. The integration of pose estimation and mouse movement is already functional but can still be fine tuned. Training of the gesture recognition model using pose estimation has also begun and is progressing smoothly. At this point, the biggest risks are if our implementation that we have committed to can meet the requirements and quantitative metrics that we set for ourselves. Hopefully through the Interim Demo we can receive feedback about if any aspect of our project needs to be rescoped or if there should be other considerations we have to make. Currently the system design and schedule are the same and we are working towards preparing a successful Interim Demo.

Alan’s Status Report for 11/6

This week, I continued to develop the mouse movement module and worked on a calibration module to prepare for the Interim Demo. The mouse movement module was updated to allow for different tracking sensitivity for users at different distances from our input camera. Additionally, now if the computer vision fails to detect the hand at any time, the mouse stays in place and will continue movement from the same location instead of jumping around once the hand is detected in a different location. This update video shows the new mouse movement at a larger distance. As seen in the video, even from across my room which is around 8-10 feet from the camera, the mouse movement is still able to precisely navigate over the minimize, resize, and exit buttons in VSCode. The cursor also stays in place whenever the hand is not detected and continues moving relative to the new location of hand detection.

Even with the poor webcam quality that misses some hand detection, the motion of the cursor is still somewhat smooth and does not jump around in an unwanted manner.

For the demo, I will also have a calibration module ready that will automatically adjust the sensitivity for users based on their maximum range of motion within the camera’s field of view. Currently, I am on schedule and should be ready to show everything that we have planned to show for the Interim Demo.

Brian Lane’s Status Report 11/6

I spent this week experimenting with various model architectures and training them with our formatted data.

Each model has 42 input features, representing an x and y coordinate for each of the 21 landmarks, to which it assigns one of 26 possible labels, each corresponding to one class or variant of gesture.

In a paper titled “Gesture Recognition Based on 3D Human Pose
Estimation and Body Part Segmentation for RGB
Data Input”
 various architectures for a problem similar to ours were tested, with much success coming from architectures structured as stacks of ‘hourglasses.’ An ‘hourglass’ is a series of fully connected linear layers that decrease in width and then increase out again, with the heuristic behind this being that the reduction in nodes, and thus the compression of information regarding a gesture, would reveal features of the model that would then be expanded again to be repeated.

Using this stacked hourglass architecture I experimented with architectures, including ‘skip’ connections that would add the inputs from compression layers (where output features are lower than input features) to decompression layers to allow information that was lost in compression to be available if needed.

I also experimented with the number of hourglasses that were stacked, the length of each hour glass, and the factor of compression between layers. Along with this, multiple loss and optimization functions were considered, with the most accurate being cross-entropy loss and Adagrad.

The most effective model found thus uses only two hourglasses of the structure pictured above, achieving a validation accuracy of 86% after being trained over 1000 epochs. The training loss over 350 epochs is pictured below.

I will spend the rest of this weekend preparing for the interim demo for next week, and next week will be spent presenting said demo. Along side this further model experimentation and training will be performed.

Brian Lane’s Status Report 10/30

I spent this week getting prepared for beginning the training of our gesture recognition model next week. For these preparations I needed to apply some data transformations to the HANDS dataset that we are using, which contains a couple hundred images of various hand gestures.

The dataset supplies images of 5 subject, 4 male and 1 female, in various positions and lighting conditions, as well as annotations of these images containing bounding boxes for each gesture being performed in the image. These annotations were stored in massive text files in csv format, with a default value of [0 0 0 0] for a bounding box if the gesture did not exist in the image. For example:

image_name,left_fist,right_fist,left_one,right_one,…
./001_color.png,[0 0 0 0],[0 0 0 0],[143 76 50 50],[259 76, 50, 50],

The above lines would represent a subject holding out the number one on both hands within the specified areas of the image.

This format for the data cannot be used to train our model, as we are now using the hand landmark coordinates as the features with which we will train/inference. I spent some time writing a script that would take in the annotations, find the hand within the image, and apply the hand pose estimation implemented by Andrew.

For example, we would start with the image below.

Then the bounding box would be used to crop both the gestures to handle separately, and then the pose estimation would be applied to find the coordinates of the hand landmarks.

The palm landmark would then be considered the origin (coordinates 0, 0) and all other points in the image would have their location expressed relative to the origin and saved in a new csv file with their corresponding label.

Andrew’s Status Report 10/30

This week, I worked on and started finishing up the code for the hand detector class file. I’m almost done making the classes for the cursor update object that will continuously store the cursor location and will keep it stationary in the event the camera does not detect hand on some individual frames. We also started working on integrating my code with Alan’s and we’re able to get some form of cursor movement with the video camera. Once I finish up the class for the cursor class, we’ll have a more streamlined movement of the cursor on our object.

As a personal update, I did not have too much time to work on the project this week due to catching up on a lot of other work and matters due to my injury. I am almost fully recovered, so I am seeking to get back on track this week. I’ve run into one minor issue with using the webcam (which I met up with Tao over the week to try to fix) as I have some sort of permission issue on my laptop that is not letting me access the webcam, but I am more or less on schedule although lagging behind a bit. The webcam functionality is not too big a deal since it works on other computers and I can always just use my laptop webcam to test.

Alan’s Status Report for 10/30

This week, I first created an OS Interface document to help plan for the OS Interface implementation while waiting on progress with other modules.

Here is the OS Interface document.

Andrew finished enough pose estimation code for me to start developing the mouse movement portion of the OS Interface with actual camera input data. I started using the pose estimation data with my mouse movement test code to work on mouse movement.

One problem that I found that I am currently working on fixing is the fact that if the pose estimation stops detecting the hand and then detects it again at a different location, this might lead to drastic cursor movement. Instead, this should keep the cursor in the same location and start relative movement once it detects the hand again, instead of moving from the previous location to the location of the new detection. I hope to get this fixed and also possible improve the smoothness of cursor movement before this coming Monday. I am on pace with our new schedule and hope to get a functioning mouse module with movement tracking with hand detection and the calibration feature ready for the Interim Demo.

Team Status Report for 10/30

This week, the team came together to discuss our individual progress and to make plans going forward towards future deadlines, especially the Interim Demo. As mentioned last week, now that the team has had time to work on the actual implementation of the project, we decided to update our schedule to more accurately reflect the tasks and timeframes.

Here is the Updated Schedule.

Additionally, we also decided on a design change for our project. Originally, we planned on feeding full image data directly from our input camera into the gesture recognition model. However, since our approach for hand detection involved using pose estimation, which put landmark coordinates onto the detected hand in each image, we decided to instead use these landmark coordinates to train our machine learning gesture recognition model instead. All of the image data in the model dataset would first be put through our pose estimation module to obtain landmark coordinates on the hands of each image, and these coordinates would be passed into the model for training and testing. This should allow for a simpler model that can be trained quicker and produce more accurate results, since a set of landmark coordinates is much simpler than pure image data. This updated design choice is reflected in our schedule with an earlier pose estimation integration task that we are all involved in.

As we near the end of our project, integration does not seem to be as daunting of a risk and instead we need to plan ahead for how we will carry out our testing and verification. A new bigger risk now is for us to consider how to measure the metrics we outlined in the project requirements. For now, we will focus on finishing our planned product for the Interim Demo and start with testing on this interim product as we continue towards our final product.