Valeria’s Status Report for 4/2/22

This week I created and inserted the instructional videos for the signs. There was a minor setback when doing this since we found that Chrome does not accept .MOV files. Therefore, I also had to convert all of the videos from .MOV to .MP4 so that the videos would show up on Chrome. Apart from that, I am also saving the user’s hand dominance after they register and log in. Originally, I thought I could save the user’s hand dominance by getting MediaPipe data. However, after discussing it further with the rest of the team, it was concluded that having the user explicitly state their hand dominance the first time they visit the website would be easier. I wasn’t able to do much else this week since I had a midterm exam for another class and I also contracted COVID.

Because I caught COVID, my progress is slightly behind what I anticipated. Originally, I planned for this week to figure out how to notify the users if they correctly/incorrectly did the sign, help with integration, and help with testing the model accuracy. Therefore, I am deciding to put user UI less of a priority for now. For next week, my main priority is to test the model accuracy during the weekend and continue helping with integration. If I’m able to catch up next week, another thing I hope to do is to add all of the tips that Hinna made for the users when they are trying to make a sign.

Aishwarya’s Status Report for 4/2/22

I integrated the model execution with the web app (such that the user’s input is parsed and passed to the model for generating a prediction). I also parsed all of the new data (that we collected in order to replace incorrect signs in the original training dataset we were using), by extracting image frames from a series of videos our group made, and then extracting landmarks from each image frame. I retrained all the models with this newly formatted data.

My progress is mildly hindered due to having covid this past week, so I haven’t been able to tune the models as much as I would like to. The models in general have slight trouble identifying unknown signs. The fist sign category model in particular seems to have the most difficulty identifying letters such as A and S. I hope that after recovering this next week, I can tune the models further in order to deal with these issues. I will have to experiment with the number of training epochs, and the model structure itself (increasing/decreasing the number of layers and nodes within each layer).

Next week, I hope to fix some of these prediction issues currently observed with the models. I also want to work on making the web app more smoothly integrated with the model execution service. Currently it requires downloading the video input from a user locally, but it would be better to cut out this middle step to improve latency.

Team Status Report for 4/2/22

The most significant risks that could currently jeopardize the success of the project is the model accuracy. At the moment, our model is very sensitive to slight hand tilts and little nuances in signs, so even when making a technically correct sign, the model is unable to identify it as correct. To manage this risk, we are planning to alter some of the layers of our model, the epochs used to train, as well as the number of nodes to see if these adjustments result in a more robust detection.  Additionally, in the next week or so, we plan to consult with Professor Gormley on our model to see if he has any recommendations for improving the detection. As for contingency plans, if we are unable to make the model more flexible in its predictions, we will adjust our instructional materials to better reflect the training data, so that users sign in a way that is seen as correct by the model.

There have not been any changes to the design but after meeting with our advisor and TA we are thinking of adding some features to the webapp such as tracking user statistics. This change will mainly be involved with the user model that we currently have, with an extra field in their profile for letters that they frequently get wrong. We are making this change to make the learning experience more personalized for users, where our platform can reinforce signs/terms that they consistently get incorrect through additional tests. Note that such changes will not be made a priority until after the interim demo, and more specifically, after we have addressed all the feedback we get from the demo.

Our schedule mostly remains the same, however, two of our three group members are currently sick with COVID so we may have a much slower week this week in terms of progress. As a result, we may have to adjust our schedule and push some tasks to later weeks.

 

 

 

Hinna’s Status Report for 3/26/22

This week, I worked with Aishwarya to test the initial model we have for static signs (1-finger, 2-finger, 3-finger, fist, etc) and discovered some discrepancies in the training data for the following signs: 3, e, f, m, n, q, t. As a result, I (along with my other two group members) created some additional training data for these signs in order to retrain the model to detect the correct version of them. 

Additionally, I worked on the normal testing data that I had assigned for this week (letters a-m, numbers 0-4), in accordance with the group schedule. I also began brainstorming ways to account for choosing the highest model prediction, as each of the possibilities in our model add up to 100 rather than being out of 100 individually. This means that we cannot specify a specific range of prediction values for deciding the best one, as we previously thought, but instead will be grouping any unrecognized movements by the user into an “other” class to ensure that the near real-time prediction is accurate.

Furthermore, for the interim demo, we are brainstorming some final aspects of the webapp such as the most intuitive way to display feedback to the user and having easy to understand instructional materials. As part of the instructional materials, I created text blurbs for all 51 signs that specify how to do each sign (along with the video) as well as certain facts about the sign (i.e. “help” is a directional sign where directing it outwards indicates giving help while directing it towards yourself indicates needing/receiving help).

At the moment, our project is on schedule with the exception that we are beginning integration a week early and that we have to account for some extra time to make the training data from this week.

As for next week, I plan to continue making testing/training data, work with Valeria to integrate the instructional materials into the webapp, and prepare for the interim demo with the rest of my group.

 

Team Status Report for 3/26/22

The most significant risks that could currently jeopardize the success of our project is the integration of the machine learning model with the webapp, where we want to make sure the user’s video input is accurately fed to the model and that the model prediction is accurately displayed in the webapp. Currently, this risk is being managed by starting integration a week earlier than planned as we want to make sure this resolved by the interim demo. As for a contingency plan for this risk, we will have to consider some alternative methods of analyzing the user input with our model, where a simpler approach may trade performance for better integration.

As for changes in our project, while the design has remained relatively the same, we realized the some of the data for ASL certain letters and numbers in the training dataset look different than traditional ASL, to the point where the model was not able to recognize us doing certain signs. As the goal of our project is to teach ASL to beginners, we want to make sure our model accurately detects the correct way to sign letters and numbers. Thus, we handpicked the signs that were most inaccurate in the training dataset and created our own training data by recording ourselves doing the various letters, and extracting frames from that video. The specific letters / numbers were: 3, e, f, m, n, q, t. While the cost of this change was increased time to make the training data, it will help the accuracy of our model in the long run. Additionally, since we plan to do external user tests, the fact that we are partially creating the training data should not affect the results of our tests as we will have different users signing into the model. 

Our schedule remains mostly the same except that we will be starting our ML/Webapp integration a week earlier and that this week, we have tasks to create some training data.

 

 

 

Valeria’s Status Report for 3/26/22

This week I was able to make the website automatically stop recording the video once 5 seconds have occurred. I connected all the pages together and can now move from page to page cohesively. Here is the link to watch a walkthrough of our website. This week I also did 10 videos for each of the dynamic signs e.g. conversation and learning categories. Furthermore, I did research into how we can send the Blob object that we are creating for the video and send it to our machine learning model to help with our integration stage.

From the research that I found, there is the possibility of sending the Blob itself to the machine learning model and having it be created into an object URL inside the model. Another idea that we found was to automatically store the video locally and have the machine learning model access it locally. While this would work, it would also not be efficient enough for what we want to accomplish. However, we realized that with time constraints this might be our fallback plan.

As of right now, my progress is on schedule. For next week, I hope to get the integration between the machine learning model and the website to be working. I also hope to create another HTML template with its associated AJAX actions to calibrate the user’s hands for MediaPipe and also get the user’s preference in hand dominance. Apart from that, I want to get the instructional videos done for the alphabet page.

Valeria’s Status Report for 3/19/22

This week I was able to finish all of the HTML templates for the web application. Currently, I only have a couple of the URLs working to be able to move around the pages and be able to check the templates, meaning that only the alphabet page and the letter A’s learn/test mode are linked with URLs. Furthermore, I have linked real-time video feedback into the web page and have the user download whatever video clip they record of themselves. The website is able to get the video once the user presses the “Start Recording” button. Once the user finishes doing the sign, currently they need to press the “Stop Recording” button for this video to be saved. Here is the link to a pdf showing the HTML templates that we currently have. Apart from that, this week I have also been helping a little bit with the machine learning models by helping Aishwarya test out the models and trying to figure out where it was going wrong.  As for my current testing database, I have added 10 more images for each of the signs that I have been in charge of for the past few weeks, leaving a total of 30 images for the signs N to Z and 5 to 9.

Currently, my progress is on schedule since I was able to catch up during spring break. My goal for the next week is to link the rest of the remaining pages e.g. numbers, conversation, and learning. I also hope to be able to have the program automatically stop recording after 5 seconds of the “Start Recording” button being pressed. Apart from that, I also hope to add 10 images for each of the new signs that I have been assigned, e.g. all of the conversational and learning dynamic signs.

Valeria’s Status Report for 2/26/22

This week I focused more on the web application since we are running slightly behind on schedule for this. I decided to look into the differences between Material UI and Bootstrap to decide on one of these front-end frameworks to use for the project. I ended up choosing Bootstrap because it’s the one that we, as a team, have more experience with and the components are easy to make. Because of this, I started working on our HTML templates for our web application. I was only able to complete how the home page and the course page are going to look and here is an image for reference. Most of the data in the HTML templates is dummy data that I am planning on replacing during the week of spring break to show our actual lesson plans. Apart from this, I have also been working on our design report. As a team, we chose to split up the report into parts and assign them to each other. So this week I have been working on the design requirements and the system implementation for the web application. Lastly, I expanded our testing dataset for letters N-Z and numbers 5-9 by adding 15 images for each sign. I decided to take these pictures with varying lighting scenarios so that we can see whether our neural network still predicts the correct labels.

As of now, my progress is slightly behind schedule. I was hoping to have all of the templates ready by this week so when we came back from spring break we have something to start with. However, I was only able to get one template done. Since next week I have four midterms that I need to study for, I am not going to have that much time to get all of the templates done in time. Because of this, I am going to continue working on the HTML templates during the week of spring break. For next week, I hope to get one HTML template done, preferably the training template e.g. the page that shows our instructional video and real-time video feedback, and finish our design report since that is a major part of our grade.

Team Status Report for 2/19/22

This week, our team finalized the type of neural network we want to use for generating ASL predictions. We gathered more research about tools to help us with model training (e.g. training in an EC2 instance) and planned out the website UI more. We worked on creating our database of ASL test data, and worked on the design report.

The most significant risks right now are if our RNN does not meet requirements for prediction accuracy and execution time. In addition, the RNN will require a large amount of time and data for training. If we increase the number of layers or neurons in an effort to improve prediction accuracy, this could increase training time. Another risk is doing feature exraction without enough efficiency. This is critical because we have a large amount of data to format so that it can be fed into the neural network.

To manage these risks, we have come up with a contigency plan to use a CNN (which can be fed frames directly). For now, we are not using a CNN because it’s performance may be much slower than that of an RNN. For feature exraction, we’re considering doing this in an EC2, so that our personal computer resources are not overwhelmed.

A design change we made was the groupings for our signs (to have separate RNNs for each). Before, we grouped simply by category (number, letter, etc.), but now, we are grouping by similarity. This will allow us to more effectively distinguish if the user is doing a sign correctly, and detect minute details that may affect this correctness.

There have been no changes to our schedule thus far.

Valeria’s Status Report for 2/19/22

This week I worked on some Figma frames to help represent how our web app is going to look and have a clear idea of what we want to put in the HTML templates. I also created another GitHub repository to put all of our images and created the folders for each sign. Since we needed to get started this week in building our testing database, I started taking pictures of signs for letters N to Z and numbers 5 to 9.  I took 5 pictures for each sign and you can see a sample of what the pictures looked like here.  The main idea was to have different angles for the photo to build the neural network in recognizing a sign at any angle. Apart from that, I looked into the possibility of building a neural network inside an EC2 instance since we found through research that building this network with our computers can potentially make them crash. I did find that it is possible but that we might need a GPU instance, which is something to consider. Lastly, I’ve spent the majority of this week, along with Aishwarya and Hinna, working on the design presentation.

Currently, our progress is mostly on schedule. However, we are currently slightly behind on the web app since we have taken priority to machine learning this week. Because we are currently behind on the web app, I’m planning on working on it this next week and not focusing as much on machine learning so that we do have the HTML templates set up. Luckily the Figma frames do help immensely on what elements to add to the pages so it shouldn’t take me more than a week to finish this up. For next week, I hope to accomplish finishing up the HTML templates and have all the pages set up with minimum actions like moving from the home page to a course module, etc. I also hope to continue building the testing database for my assigned signs (N-Z, 5-9) with at least 10 pictures per sign.  Apart from that, I would also be helping finish up our design paper.