Hinna’s Status Report for 3/26/22

This week, I worked with Aishwarya to test the initial model we have for static signs (1-finger, 2-finger, 3-finger, fist, etc) and discovered some discrepancies in the training data for the following signs: 3, e, f, m, n, q, t. As a result, I (along with my other two group members) created some additional training data for these signs in order to retrain the model to detect the correct version of them. 

Additionally, I worked on the normal testing data that I had assigned for this week (letters a-m, numbers 0-4), in accordance with the group schedule. I also began brainstorming ways to account for choosing the highest model prediction, as each of the possibilities in our model add up to 100 rather than being out of 100 individually. This means that we cannot specify a specific range of prediction values for deciding the best one, as we previously thought, but instead will be grouping any unrecognized movements by the user into an “other” class to ensure that the near real-time prediction is accurate.

Furthermore, for the interim demo, we are brainstorming some final aspects of the webapp such as the most intuitive way to display feedback to the user and having easy to understand instructional materials. As part of the instructional materials, I created text blurbs for all 51 signs that specify how to do each sign (along with the video) as well as certain facts about the sign (i.e. “help” is a directional sign where directing it outwards indicates giving help while directing it towards yourself indicates needing/receiving help).

At the moment, our project is on schedule with the exception that we are beginning integration a week early and that we have to account for some extra time to make the training data from this week.

As for next week, I plan to continue making testing/training data, work with Valeria to integrate the instructional materials into the webapp, and prepare for the interim demo with the rest of my group.

 

Team Status Report for 3/26/22

The most significant risks that could currently jeopardize the success of our project is the integration of the machine learning model with the webapp, where we want to make sure the user’s video input is accurately fed to the model and that the model prediction is accurately displayed in the webapp. Currently, this risk is being managed by starting integration a week earlier than planned as we want to make sure this resolved by the interim demo. As for a contingency plan for this risk, we will have to consider some alternative methods of analyzing the user input with our model, where a simpler approach may trade performance for better integration.

As for changes in our project, while the design has remained relatively the same, we realized the some of the data for ASL certain letters and numbers in the training dataset look different than traditional ASL, to the point where the model was not able to recognize us doing certain signs. As the goal of our project is to teach ASL to beginners, we want to make sure our model accurately detects the correct way to sign letters and numbers. Thus, we handpicked the signs that were most inaccurate in the training dataset and created our own training data by recording ourselves doing the various letters, and extracting frames from that video. The specific letters / numbers were: 3, e, f, m, n, q, t. While the cost of this change was increased time to make the training data, it will help the accuracy of our model in the long run. Additionally, since we plan to do external user tests, the fact that we are partially creating the training data should not affect the results of our tests as we will have different users signing into the model. 

Our schedule remains mostly the same except that we will be starting our ML/Webapp integration a week earlier and that this week, we have tasks to create some training data.

 

 

 

Hinna’s Status Report for 3/19/22

This week, I personally worked on making 30 iterations of each of our 15 dynamic, communicative signs. I also went through the WLASL database for dynamic signs and got all the video clips of training data for the 15 signs. In doing this, I realized that a lot of the videos listed in the dataset no longer exist, meaning that we will have to both augment the existing videos to get more data and potentially use the testing data I have made as training data. In addition to working with this data, I have been doing some research into working with the AWS EC2 instance, image classification after landmark identification through MediaPipe, and methods for augmenting data.

My progress is currently on schedule, however in deciding that we will need to also create training data for the dynamic signs, we have some new tasks to add, which I will be primarily responsible for. In order to catch up on this, I will be putting my testing data creation on hold to prioritize the dynamic sign training.

In the next week, I plan to have 50 videos of training data per 15 dynamic signs, where the 50 will be combination of data I have created, data from WLASL, and augmented videos. Additionally, I plan to help Aishwarya with model training and work on the instructional web application materials.

Hinna’s Status Report for 2/26/2022

This week, I worked with my team on the design review, with the main deliverables being the presentation and the report. I personally worked on creating 15 more iterations of testing data, with the 15 communicative and dynamic signs that I was assigned. I also helped create diagrams for the design presentation, specifically with the microservice architecture we used to describe our neural network separation.

Currently our project is on schedule but we definitely feel a little bit of the time pressure. We have not yet begun training our ML model because we only finalized our neural network type during the design review this week. Additionally, all of us are very busy with midterms and writing the design report so we haven’t done as much work as we wanted to on the project implementation itself. To account for this, we plan to meet more frequently as a team and extend some tasks past spring break in our schedule (such as creating testing data).

Next week, I hope to work with my team to complete the design report where I am primarily responsible for the introduction, use-case requirements, testing, and Project Management sections of the report.

Team Status Report for 2/26/22

This week we had the Design Presentation and began working on our Design Report. We received feedback from the presentation mainly regarding our design choice to use an LSTM with MediaPipe, where our advisor was a little wary about how well this would work. After discussing it as a group and doing some more research, we are confident that our choice will fit the use-case.

Currently, the most significant risks that could jeopardize the success of our project are semester time constraints. Given that the design review and midterms have taken a lot of time over the past few weeks, and that spring break is coming up, we have not had a lot of time to work on our actual implementation. This is especially concerning given the amount of time it generally takes to train an ML model and the amount of data we need to both create and process. To manage this, we will prioritize training the model based on our neural network groupings, where the networks with less signs will hopefully be quicker to train. Additionally, we will have more frequent group meetings and internal deadlines, so that we can meet all the milestones in the remaining time we have.  As for contingency plans, if training the model takes too long we will cut down the number of signs we are including in the platform for quicker training while still maintaining the usefulness of the signs provided to the users.

In terms of changes to the existing design, we realized that utilizing hand landmarks and face landmarks presented some compatibility problems and too much complexity given our current expertise and remaining time. Thus, we removed all signs that involved contact with the face/head and replaced them with other signs (that still involve motion). Because this change was made during our design phase, there are no real costs associated with this change as our chosen signs are still in our chosen datasets and maintain the same level of communicativeness for the user.

Our schedule is mostly the same as before but we plan to make testing data for the model in weeks after spring break and also internally, we plan to devote more effort to training the model.

Hinna’s Status Report for 2/19/22

This past week, my focus was mostly on the components of our project related to the design presentation and report.

For my individual accomplishments, I first created some testing data for our machine model for 15 of our 51 signs, with 5 versions of each of the 15 signs. In these different versions, I varied the lighting and angle at which I signed to allow for more robust testing when we begin testing our model. I also researched the benefits of an RNN (our chosen neural network) versus a CNN (our contingency plan) to help my team make a more informed choice on how to structure our solution.

Additionally, I finalized the microarchitecture of our different neural networks, meaning I figured out how to sort our 51 signs into different models based on similarity. The purpose of this sorting is to ensure that when users sign on the platform, our models will be trained against similar signs in order to more definitively decide if the user is correctly signing one of our terms. The 5 different neural networks are roughly sorted by fist signs, 1 finger signs, 2 finger signs, 3 finger signs, and 4-5 finger / open handed signs. Note that these neural networks will still have similar structures (RNN, hyperbolic tanh activation, same number of layers, etc) but will differ in the signs they are trained to detect.

Our project is currently on schedule, as long as we are able to start training the model in the next week or so. Now that we have a more definitive system design our timeline seems more attainable than last week (when we weren’t sure which neural network to use to implement our solution).

As for deliverables next week, I plan to do more iterations of the 15 assigned signs I have to contribute to our testing data. I also plan to work with my team on our design report and begin training our model.