Team Status Reports – Team E5: ASL Learning Platform

April 30, 2022

Team Status Report for 4/30/22

The most significant risk for the success of this project is model tuning, meaning we don’t achieve the accuracy that we aimed for before the final demo. To mitigate this risk, we are continuing to train our models by adding more training data. As for contingency plans, we are going to leave it as it is since there’s only a week till the demo. Also, after talking to Professor Gormley, the machine learning professor at CMU, he suggested that we should not change our neural network structures due to time constraints.

There have been no changes to the existing design of the system and to our schedule.

April 23, 2022

Team Status Report for 4/23/22

The most significant risks that could jeopardize the success of our project currently are the model accuracies. Over the past week, we have been looking at accuracy tradeoffs and started conducting user tests, and with the dynamic models, we can see that when users sign them quickly, our model is not able to accurately detect the dynamic signs. To fix this, we are considering making training data of doing the signs faster so that the model can be trained on faster iterations of each sign. As a contingency plan, we will just tell the user to sign slightly slower or just keep the models as they are since we are nearing the end of the semester.

There haven’t been any changes to our system design. As for our schedule, we are going to extend our user testing weeks all the way up to the demo since we were not able to get enough users to sign up over this past week. Additionally, we plan to collect survey results at the live demo to get more user feedback to add to the final report. Also, because we have the webapp and ML models fully integrated, we are shortening the integration task on our schedule by two weeks.

April 16, 2022

Team Status Report for 4/16/22

The most significant risks that would jeopardize the success of our project are the accuracy of the static model predictions, the accuracy of the dynamic models (which is a post MVP concern), and minor bugs such as our start-stop recording not timing out correctly.

In regard to the static model accuracy, we are managing this risk by examining each model to determine which signs are having issues and playing with different epoch/prediction threshold values to see if different combinations improve the model accuracy. Given that there are only 2 weeks left in the semester, if the accuracy is not improved we will simply mention that in the report but will keep the models as they are in the platform since they are overall performing well.

As for the dynamic models, this a post MVP portion of our project, and currently a lot of the 2-handed moving signs are not being accurately detected. To mange this risk, based on feedback from Tamal and Isabel, we are checking for the number of hands present in frame for 2-handed tests, and immediately saying that its wrong if there is only 1 hand present. Additionally, we are looking into the flickering of media pipe landmarks which occurs when the hands blur in motion in the middle of doing the signs. We are thinking of padding or removing those blurred frames. Again, as a contingency plan, we will most likely keep the dynamic model in our platform as is if we can’t improve it, given that there are only 2 weeks left, and address the inaccuracies in our report.

In regard to the minor bugs, like the start-stop timeout, we are using logs and online forums to try to figure out why our recording is not actually stopping. If the issue persists, we will reach out on Slack to get some help with the error. It should be noted that this issue is not a major problem as the main purpose of our platform (having users sign and get predictions on whether their signing is correct) is not affected by it. However, if the issue is not able to be fixed, we will simply instruct the users on how to get around it (i.e. to repeat execution they have to refresh the page instead of being able to press start again).

There have not been changes to our system design or schedule.

April 10, 2022

Team Status Report for 4/10/22

This week the most significant risk is processing the data for the dynamic models because it will take some time to format all the data correctly and see how well the model does. To manage this, we are dedicating more time to working with the dynamic data and making it our primary focus for this week. As for contingency plans, if we continue having issues, given that we only have 3 weeks left and that static signs were our MVP, we will most likely leave out the testing portion for dynamic signs, and only include them with educational materials in our platform.

A secondary risk is that the fist model (including signs like a, m, n, s, t, etc) is not performing as expected, where it is not correctly distinguishing between the signs. To manage this, we will be investigating the training data for the fist model this week to figure out why it is not performing well, currently we think the issue is due to varying hand positions but we will confirm after looking into it more this week. As for contingency plans, if we are unable to figure out the fist model issue, we will separate the fist signs into the other models, so that the model will only have to distinguish between dissimilar signs.

Our design has not changed over the past week nor has our schedule seen any significant changes.

April 1, 2022

Team Status Report for 4/2/22

The most significant risks that could currently jeopardize the success of the project is the model accuracy. At the moment, our model is very sensitive to slight hand tilts and little nuances in signs, so even when making a technically correct sign, the model is unable to identify it as correct. To manage this risk, we are planning to alter some of the layers of our model, the epochs used to train, as well as the number of nodes to see if these adjustments result in a more robust detection. Additionally, in the next week or so, we plan to consult with Professor Gormley on our model to see if he has any recommendations for improving the detection. As for contingency plans, if we are unable to make the model more flexible in its predictions, we will adjust our instructional materials to better reflect the training data, so that users sign in a way that is seen as correct by the model.

There have not been any changes to the design but after meeting with our advisor and TA we are thinking of adding some features to the webapp such as tracking user statistics. This change will mainly be involved with the user model that we currently have, with an extra field in their profile for letters that they frequently get wrong. We are making this change to make the learning experience more personalized for users, where our platform can reinforce signs/terms that they consistently get incorrect through additional tests. Note that such changes will not be made a priority until after the interim demo, and more specifically, after we have addressed all the feedback we get from the demo.

Our schedule mostly remains the same, however, two of our three group members are currently sick with COVID so we may have a much slower week this week in terms of progress. As a result, we may have to adjust our schedule and push some tasks to later weeks.

March 26, 2022

Team Status Report for 3/26/22

The most significant risks that could currently jeopardize the success of our project is the integration of the machine learning model with the webapp, where we want to make sure the user’s video input is accurately fed to the model and that the model prediction is accurately displayed in the webapp. Currently, this risk is being managed by starting integration a week earlier than planned as we want to make sure this resolved by the interim demo. As for a contingency plan for this risk, we will have to consider some alternative methods of analyzing the user input with our model, where a simpler approach may trade performance for better integration.

As for changes in our project, while the design has remained relatively the same, we realized the some of the data for ASL certain letters and numbers in the training dataset look different than traditional ASL, to the point where the model was not able to recognize us doing certain signs. As the goal of our project is to teach ASL to beginners, we want to make sure our model accurately detects the correct way to sign letters and numbers. Thus, we handpicked the signs that were most inaccurate in the training dataset and created our own training data by recording ourselves doing the various letters, and extracting frames from that video. The specific letters / numbers were: 3, e, f, m, n, q, t. While the cost of this change was increased time to make the training data, it will help the accuracy of our model in the long run. Additionally, since we plan to do external user tests, the fact that we are partially creating the training data should not affect the results of our tests as we will have different users signing into the model.

Our schedule remains mostly the same except that we will be starting our ML/Webapp integration a week earlier and that this week, we have tasks to create some training data.

March 19, 2022

Team Status Report for 3/19/22

Currently, the most significant risks of our project are the machine learning models for the different groups of signs. Specifically, some of the datasets we found to use for training data are not being picked up well by MediaPipe or are not good enough quality, so we are running into some issues with training the model. To mitigate these risks, we are looking for new datasets – particularly for the letters and number signs – and potentially going to be making our own training data for the dynamic signs, as these are the ones with the fewest datasets available online. As for contingency plans, if we are unable to find a good enough dataset that works well with MediaPipe, we might forgo the usage of MediaPipe and create our own CNN for processing the image/video data.

There have not really been any changes to our system design over this past week. One potential change we have been discussing is the grouping of signs over various neural networks, where we might now separate static and dynamic signs rather than dividing signs by the hand shape. This is partially because our static signs are one-handed, with image training data whereas a lot of our dynamic signs are two-handed with video training data. This change was necessary because it makes classification for static signs easier as we can limit the number of hands detected in frame. There aren’t really any costs incurred by this change as we had not yet made models that were separated by hand shape.

Our schedule has also not really changed but we will be allocating some extra time to make the dynamic sign training data since we initially did not anticipate needing to do this.

February 26, 2022February 26, 2022

Team Status Report for 2/26/22

This week we had the Design Presentation and began working on our Design Report. We received feedback from the presentation mainly regarding our design choice to use an LSTM with MediaPipe, where our advisor was a little wary about how well this would work. After discussing it as a group and doing some more research, we are confident that our choice will fit the use-case.

Currently, the most significant risks that could jeopardize the success of our project are semester time constraints. Given that the design review and midterms have taken a lot of time over the past few weeks, and that spring break is coming up, we have not had a lot of time to work on our actual implementation. This is especially concerning given the amount of time it generally takes to train an ML model and the amount of data we need to both create and process. To manage this, we will prioritize training the model based on our neural network groupings, where the networks with less signs will hopefully be quicker to train. Additionally, we will have more frequent group meetings and internal deadlines, so that we can meet all the milestones in the remaining time we have. As for contingency plans, if training the model takes too long we will cut down the number of signs we are including in the platform for quicker training while still maintaining the usefulness of the signs provided to the users.

In terms of changes to the existing design, we realized that utilizing hand landmarks and face landmarks presented some compatibility problems and too much complexity given our current expertise and remaining time. Thus, we removed all signs that involved contact with the face/head and replaced them with other signs (that still involve motion). Because this change was made during our design phase, there are no real costs associated with this change as our chosen signs are still in our chosen datasets and maintain the same level of communicativeness for the user.

Our schedule is mostly the same as before but we plan to make testing data for the model in weeks after spring break and also internally, we plan to devote more effort to training the model.

February 19, 2022February 19, 2022

Team Status Report for 2/19/22

This week, our team finalized the type of neural network we want to use for generating ASL predictions. We gathered more research about tools to help us with model training (e.g. training in an EC2 instance) and planned out the website UI more. We worked on creating our database of ASL test data, and worked on the design report.

The most significant risks right now are if our RNN does not meet requirements for prediction accuracy and execution time. In addition, the RNN will require a large amount of time and data for training. If we increase the number of layers or neurons in an effort to improve prediction accuracy, this could increase training time. Another risk is doing feature exraction without enough efficiency. This is critical because we have a large amount of data to format so that it can be fed into the neural network.

To manage these risks, we have come up with a contigency plan to use a CNN (which can be fed frames directly). For now, we are not using a CNN because it’s performance may be much slower than that of an RNN. For feature exraction, we’re considering doing this in an EC2, so that our personal computer resources are not overwhelmed.

A design change we made was the groupings for our signs (to have separate RNNs for each). Before, we grouped simply by category (number, letter, etc.), but now, we are grouping by similarity. This will allow us to more effectively distinguish if the user is doing a sign correctly, and detect minute details that may affect this correctness.

There have been no changes to our schedule thus far.

February 12, 2022February 12, 2022

Team Status Report for 2/12/22

This week, our team gave the proposal presentation. We met twice before the presentation to practice what we were going to say. As part of our preparation for the proposal presentation, we also created a solution block diagram to visualize the main components needed for our project. Furthermore, we created a visualization of the different modes for our web application (training vs testing). After our proposal presentation, we met on Friday to discuss how we were going to design our machine learning model. We were researching what were the best types of neural networks to use for both images and videos to label them with a correct prediction. We discussed the limitations of convolutional networks and looked more into recurrent neural networks. We also discussed how we might want to approach feature extraction (modifying the coordinate points from the hands into a more useful set of distance data). Distance data may allow us to have greater prediction accuracy than raw image inputs, which can have interference from background pixels. Currently, our most significant risk is incorrectly choosing the neural network, as well as having our models not be accurate enough for users. Another potential risk is incorrectly processing our images during feature extraction leading to latency and incorrect predictions. Our current risk mitigation is that we are researching the best neural network. But we have decided that worst-case scenario we would choose convolutional neural networks, which would allow us to simply feed the images themselves as inputs with the consequences, however, of lower accuracy and more latency. Lastly, a potential worry is that we need to start training soon but our design is still in progress, so we have firm time constraints to keep in mind.