Team Status Report for 4/10/22

This week the most significant risk is processing the data for the dynamic models because it will take some time to format all the data correctly and see how well the model does. To manage this, we are dedicating more time to working with the dynamic data and making it our primary focus for this week. As for contingency plans, if we continue having issues, given that we only have 3 weeks left and that static signs were our MVP, we will most likely leave out the testing portion for dynamic signs, and only include them with educational materials in our platform.

A secondary risk is that the fist model (including signs like a, m, n, s, t, etc) is not performing as expected, where it is not correctly distinguishing between the signs. To manage this, we will be investigating the training data for the fist model this week to figure out why it is not performing well, currently we think the issue is due to varying hand positions but we will confirm after looking into it more this week. As for contingency plans, if we are unable to figure out the fist model issue, we will separate the fist signs into the other models, so that the model will only have to distinguish between dissimilar signs.

Our design has not changed over the past week nor has our schedule seen any significant changes.

Hinna’s Status Report for 4/2/22

This week, I personally worked on my portions of creating the ASL testing/training data, adjusting the instructional material (videos and text) to make sure it fits with how our model is reading hand data (i.e. making sure hands are angled so that all/most fingers are visible in frame), and locally examining the web app + ML models.

In regard to examining the web app, I have been brainstorming with Valeria on some of the suggestions that were given to us on our previous week meeting, where we are trying to decide the best way to store user statistics, include a random quiz mode for users to use with the ability for them to select which categories to be tested on, and format/display user profile information. As for the machine learning models, I have been locally working with them to see which is performing the best (currently the 1 finger model is) and to try to determine holes in how they have been trained (i.e. the sign for ‘d’ requires a certain tilt for it to be accurately detected). After identifying some of these signs, I have been working with Aishwarya to figure out solutions for theses issues in the models.

Our schedule is technically on schedule but is in danger of being behind, especially because my other two group members tested positive for Covid this past week. To account for this, we are pushing some tasks back a week on our schedule (such as integration) and doing our best to collaborate virtually.

In the next week, we plan to successfully execute an interim demo with most if not all of the static sign models working, along with a webapp that can teach users the ASL terms we have chosen after they make an account on the platform.

Aishwarya’s Status Report for 4/2/22

I integrated the model execution with the web app (such that the user’s input is parsed and passed to the model for generating a prediction). I also parsed all of the new data (that we collected in order to replace incorrect signs in the original training dataset we were using), by extracting image frames from a series of videos our group made, and then extracting landmarks from each image frame. I retrained all the models with this newly formatted data.

My progress is mildly hindered due to having covid this past week, so I haven’t been able to tune the models as much as I would like to. The models in general have slight trouble identifying unknown signs. The fist sign category model in particular seems to have the most difficulty identifying letters such as A and S. I hope that after recovering this next week, I can tune the models further in order to deal with these issues. I will have to experiment with the number of training epochs, and the model structure itself (increasing/decreasing the number of layers and nodes within each layer).

Next week, I hope to fix some of these prediction issues currently observed with the models. I also want to work on making the web app more smoothly integrated with the model execution service. Currently it requires downloading the video input from a user locally, but it would be better to cut out this middle step to improve latency.

Team Status Report for 4/2/22

The most significant risks that could currently jeopardize the success of the project is the model accuracy. At the moment, our model is very sensitive to slight hand tilts and little nuances in signs, so even when making a technically correct sign, the model is unable to identify it as correct. To manage this risk, we are planning to alter some of the layers of our model, the epochs used to train, as well as the number of nodes to see if these adjustments result in a more robust detection.  Additionally, in the next week or so, we plan to consult with Professor Gormley on our model to see if he has any recommendations for improving the detection. As for contingency plans, if we are unable to make the model more flexible in its predictions, we will adjust our instructional materials to better reflect the training data, so that users sign in a way that is seen as correct by the model.

There have not been any changes to the design but after meeting with our advisor and TA we are thinking of adding some features to the webapp such as tracking user statistics. This change will mainly be involved with the user model that we currently have, with an extra field in their profile for letters that they frequently get wrong. We are making this change to make the learning experience more personalized for users, where our platform can reinforce signs/terms that they consistently get incorrect through additional tests. Note that such changes will not be made a priority until after the interim demo, and more specifically, after we have addressed all the feedback we get from the demo.

Our schedule mostly remains the same, however, two of our three group members are currently sick with COVID so we may have a much slower week this week in terms of progress. As a result, we may have to adjust our schedule and push some tasks to later weeks.

 

 

 

Hinna’s Status Report for 3/26/22

This week, I worked with Aishwarya to test the initial model we have for static signs (1-finger, 2-finger, 3-finger, fist, etc) and discovered some discrepancies in the training data for the following signs: 3, e, f, m, n, q, t. As a result, I (along with my other two group members) created some additional training data for these signs in order to retrain the model to detect the correct version of them. 

Additionally, I worked on the normal testing data that I had assigned for this week (letters a-m, numbers 0-4), in accordance with the group schedule. I also began brainstorming ways to account for choosing the highest model prediction, as each of the possibilities in our model add up to 100 rather than being out of 100 individually. This means that we cannot specify a specific range of prediction values for deciding the best one, as we previously thought, but instead will be grouping any unrecognized movements by the user into an “other” class to ensure that the near real-time prediction is accurate.

Furthermore, for the interim demo, we are brainstorming some final aspects of the webapp such as the most intuitive way to display feedback to the user and having easy to understand instructional materials. As part of the instructional materials, I created text blurbs for all 51 signs that specify how to do each sign (along with the video) as well as certain facts about the sign (i.e. “help” is a directional sign where directing it outwards indicates giving help while directing it towards yourself indicates needing/receiving help).

At the moment, our project is on schedule with the exception that we are beginning integration a week early and that we have to account for some extra time to make the training data from this week.

As for next week, I plan to continue making testing/training data, work with Valeria to integrate the instructional materials into the webapp, and prepare for the interim demo with the rest of my group.

 

Aishwarya’s Status Report for 3/26/22

I trained models for 4 of our model groups (1-finger, 2-finger, 3-finger, and fist-shaped). With testing these, we noticed some unexpected behavior, particularly with the 3-finger model, and realized that the training dataset had incorrect samples for letters such as M and N. I, along with my other group members, recorded videos to create new data that would replace these samples. I wrote a script to extract frames from these videos to store as jpg images, allowing us to generate a few thousand images for the labels that needed to have their samples replaced.  Due to these issues we discovered with the datasets, I will need to reformat the training data and retrain some of the models with these newly created samples.

Our progress is on schedule. During this next week, I hope to integrate the web app video input with the model execution code in preparation for our interim demo. I will also complete re-parsing the data with our new samples for training and retrain the models.

The video linked is a mini-demonstration of one of my models performing real-time predictions.

 

 

Team Status Report for 3/26/22

The most significant risks that could currently jeopardize the success of our project is the integration of the machine learning model with the webapp, where we want to make sure the user’s video input is accurately fed to the model and that the model prediction is accurately displayed in the webapp. Currently, this risk is being managed by starting integration a week earlier than planned as we want to make sure this resolved by the interim demo. As for a contingency plan for this risk, we will have to consider some alternative methods of analyzing the user input with our model, where a simpler approach may trade performance for better integration.

As for changes in our project, while the design has remained relatively the same, we realized the some of the data for ASL certain letters and numbers in the training dataset look different than traditional ASL, to the point where the model was not able to recognize us doing certain signs. As the goal of our project is to teach ASL to beginners, we want to make sure our model accurately detects the correct way to sign letters and numbers. Thus, we handpicked the signs that were most inaccurate in the training dataset and created our own training data by recording ourselves doing the various letters, and extracting frames from that video. The specific letters / numbers were: 3, e, f, m, n, q, t. While the cost of this change was increased time to make the training data, it will help the accuracy of our model in the long run. Additionally, since we plan to do external user tests, the fact that we are partially creating the training data should not affect the results of our tests as we will have different users signing into the model. 

Our schedule remains mostly the same except that we will be starting our ML/Webapp integration a week earlier and that this week, we have tasks to create some training data.

 

 

 

Aishwarya’s Status Report for 3/19/22

I completed the code to parse the data for images and videos, passing them through MediaPipe and extracting and formatted the landmark coordinate data. The rough table below shows my initial findings for training and testing accuracy using a dataset for letters D,I,L, and X with 30 images per letter class. Over varying parameters to see how this affected the testing accuracy, the best test accuracy I could achieve was 80.56%. Overall, this seems to be an issue with overfitting (expecially since this initial data set is small).

Another dataset was found with 3000 images per letter class (though many of these fail to have landmark data extracted by MediaPipe). With using this dataset, overfitting still seemed to be an issue, though the model seems to perform well when testing in real time (I made signs in front of my web camera and found it to identify them pretty accurately). During this realtime evaluation, I found that it worked for my left hand. This means I will need to mirror the images the correct way to train the models for each right-handed  and left-handed signs.

My progress is on schedule. To combat issues with overfitting during the next week, I will continue trying to train with a larger dataset, varying parameters, and modifying the model structure. By the end of next week, I hope to have the models trained for each ASL grouping.

 

Hinna’s Status Report for 3/19/22

This week, I personally worked on making 30 iterations of each of our 15 dynamic, communicative signs. I also went through the WLASL database for dynamic signs and got all the video clips of training data for the 15 signs. In doing this, I realized that a lot of the videos listed in the dataset no longer exist, meaning that we will have to both augment the existing videos to get more data and potentially use the testing data I have made as training data. In addition to working with this data, I have been doing some research into working with the AWS EC2 instance, image classification after landmark identification through MediaPipe, and methods for augmenting data.

My progress is currently on schedule, however in deciding that we will need to also create training data for the dynamic signs, we have some new tasks to add, which I will be primarily responsible for. In order to catch up on this, I will be putting my testing data creation on hold to prioritize the dynamic sign training.

In the next week, I plan to have 50 videos of training data per 15 dynamic signs, where the 50 will be combination of data I have created, data from WLASL, and augmented videos. Additionally, I plan to help Aishwarya with model training and work on the instructional web application materials.

Team Status Report for 3/19/22

Currently, the most significant risks of our project are the machine learning models for the different groups of signs. Specifically, some of the datasets we found to use for training data are not being picked up well by MediaPipe or are not good enough quality, so we are running into some issues with training the model. To mitigate these risks, we are looking for new datasets – particularly for the letters and number signs – and potentially going to be making our own training data for the dynamic signs, as these are the ones with the fewest datasets available online.  As for contingency plans, if we are unable to find a good enough dataset that works well with MediaPipe, we might forgo the usage of MediaPipe and create our own CNN for processing the image/video data.

There have not really been any changes to our system design over this past week. One potential change we have been discussing is the grouping of signs over various neural networks, where we might now separate static and dynamic signs rather than dividing signs by the hand shape. This is partially because our static signs are one-handed, with image training data whereas a lot of our dynamic signs are two-handed with video training data. This change was necessary because it makes classification for static signs easier as we can limit the number of hands detected in frame. There aren’t really any costs incurred by this change as we had not yet made models that were separated by hand shape.

Our schedule has also not really changed but we will be allocating some extra time to make the dynamic sign training data since we initially did not anticipate needing to do this.