Valeria’s Status Report for 3/26/22

This week I was able to make the website automatically stop recording the video once 5 seconds have occurred. I connected all the pages together and can now move from page to page cohesively. Here is the link to watch a walkthrough of our website. This week I also did 10 videos for each of the dynamic signs e.g. conversation and learning categories. Furthermore, I did research into how we can send the Blob object that we are creating for the video and send it to our machine learning model to help with our integration stage.

From the research that I found, there is the possibility of sending the Blob itself to the machine learning model and having it be created into an object URL inside the model. Another idea that we found was to automatically store the video locally and have the machine learning model access it locally. While this would work, it would also not be efficient enough for what we want to accomplish. However, we realized that with time constraints this might be our fallback plan.

As of right now, my progress is on schedule. For next week, I hope to get the integration between the machine learning model and the website to be working. I also hope to create another HTML template with its associated AJAX actions to calibrate the user’s hands for MediaPipe and also get the user’s preference in hand dominance. Apart from that, I want to get the instructional videos done for the alphabet page.

Aishwarya’s Status Report for 3/19/22

I completed the code to parse the data for images and videos, passing them through MediaPipe and extracting and formatted the landmark coordinate data. The rough table below shows my initial findings for training and testing accuracy using a dataset for letters D,I,L, and X with 30 images per letter class. Over varying parameters to see how this affected the testing accuracy, the best test accuracy I could achieve was 80.56%. Overall, this seems to be an issue with overfitting (expecially since this initial data set is small).

Another dataset was found with 3000 images per letter class (though many of these fail to have landmark data extracted by MediaPipe). With using this dataset, overfitting still seemed to be an issue, though the model seems to perform well when testing in real time (I made signs in front of my web camera and found it to identify them pretty accurately). During this realtime evaluation, I found that it worked for my left hand. This means I will need to mirror the images the correct way to train the models for each right-handed  and left-handed signs.

My progress is on schedule. To combat issues with overfitting during the next week, I will continue trying to train with a larger dataset, varying parameters, and modifying the model structure. By the end of next week, I hope to have the models trained for each ASL grouping.

 

Valeria’s Status Report for 3/19/22

This week I was able to finish all of the HTML templates for the web application. Currently, I only have a couple of the URLs working to be able to move around the pages and be able to check the templates, meaning that only the alphabet page and the letter A’s learn/test mode are linked with URLs. Furthermore, I have linked real-time video feedback into the web page and have the user download whatever video clip they record of themselves. The website is able to get the video once the user presses the “Start Recording” button. Once the user finishes doing the sign, currently they need to press the “Stop Recording” button for this video to be saved. Here is the link to a pdf showing the HTML templates that we currently have. Apart from that, this week I have also been helping a little bit with the machine learning models by helping Aishwarya test out the models and trying to figure out where it was going wrong.  As for my current testing database, I have added 10 more images for each of the signs that I have been in charge of for the past few weeks, leaving a total of 30 images for the signs N to Z and 5 to 9.

Currently, my progress is on schedule since I was able to catch up during spring break. My goal for the next week is to link the rest of the remaining pages e.g. numbers, conversation, and learning. I also hope to be able to have the program automatically stop recording after 5 seconds of the “Start Recording” button being pressed. Apart from that, I also hope to add 10 images for each of the new signs that I have been assigned, e.g. all of the conversational and learning dynamic signs.

Hinna’s Status Report for 3/19/22

This week, I personally worked on making 30 iterations of each of our 15 dynamic, communicative signs. I also went through the WLASL database for dynamic signs and got all the video clips of training data for the 15 signs. In doing this, I realized that a lot of the videos listed in the dataset no longer exist, meaning that we will have to both augment the existing videos to get more data and potentially use the testing data I have made as training data. In addition to working with this data, I have been doing some research into working with the AWS EC2 instance, image classification after landmark identification through MediaPipe, and methods for augmenting data.

My progress is currently on schedule, however in deciding that we will need to also create training data for the dynamic signs, we have some new tasks to add, which I will be primarily responsible for. In order to catch up on this, I will be putting my testing data creation on hold to prioritize the dynamic sign training.

In the next week, I plan to have 50 videos of training data per 15 dynamic signs, where the 50 will be combination of data I have created, data from WLASL, and augmented videos. Additionally, I plan to help Aishwarya with model training and work on the instructional web application materials.

Team Status Report for 3/19/22

Currently, the most significant risks of our project are the machine learning models for the different groups of signs. Specifically, some of the datasets we found to use for training data are not being picked up well by MediaPipe or are not good enough quality, so we are running into some issues with training the model. To mitigate these risks, we are looking for new datasets – particularly for the letters and number signs – and potentially going to be making our own training data for the dynamic signs, as these are the ones with the fewest datasets available online.  As for contingency plans, if we are unable to find a good enough dataset that works well with MediaPipe, we might forgo the usage of MediaPipe and create our own CNN for processing the image/video data.

There have not really been any changes to our system design over this past week. One potential change we have been discussing is the grouping of signs over various neural networks, where we might now separate static and dynamic signs rather than dividing signs by the hand shape. This is partially because our static signs are one-handed, with image training data whereas a lot of our dynamic signs are two-handed with video training data. This change was necessary because it makes classification for static signs easier as we can limit the number of hands detected in frame. There aren’t really any costs incurred by this change as we had not yet made models that were separated by hand shape.

Our schedule has also not really changed but we will be allocating some extra time to make the dynamic sign training data since we initially did not anticipate needing to do this.

Hinna’s Status Report for 2/26/2022

This week, I worked with my team on the design review, with the main deliverables being the presentation and the report. I personally worked on creating 15 more iterations of testing data, with the 15 communicative and dynamic signs that I was assigned. I also helped create diagrams for the design presentation, specifically with the microservice architecture we used to describe our neural network separation.

Currently our project is on schedule but we definitely feel a little bit of the time pressure. We have not yet begun training our ML model because we only finalized our neural network type during the design review this week. Additionally, all of us are very busy with midterms and writing the design report so we haven’t done as much work as we wanted to on the project implementation itself. To account for this, we plan to meet more frequently as a team and extend some tasks past spring break in our schedule (such as creating testing data).

Next week, I hope to work with my team to complete the design report where I am primarily responsible for the introduction, use-case requirements, testing, and Project Management sections of the report.

Valeria’s Status Report for 2/26/22

This week I focused more on the web application since we are running slightly behind on schedule for this. I decided to look into the differences between Material UI and Bootstrap to decide on one of these front-end frameworks to use for the project. I ended up choosing Bootstrap because it’s the one that we, as a team, have more experience with and the components are easy to make. Because of this, I started working on our HTML templates for our web application. I was only able to complete how the home page and the course page are going to look and here is an image for reference. Most of the data in the HTML templates is dummy data that I am planning on replacing during the week of spring break to show our actual lesson plans. Apart from this, I have also been working on our design report. As a team, we chose to split up the report into parts and assign them to each other. So this week I have been working on the design requirements and the system implementation for the web application. Lastly, I expanded our testing dataset for letters N-Z and numbers 5-9 by adding 15 images for each sign. I decided to take these pictures with varying lighting scenarios so that we can see whether our neural network still predicts the correct labels.

As of now, my progress is slightly behind schedule. I was hoping to have all of the templates ready by this week so when we came back from spring break we have something to start with. However, I was only able to get one template done. Since next week I have four midterms that I need to study for, I am not going to have that much time to get all of the templates done in time. Because of this, I am going to continue working on the HTML templates during the week of spring break. For next week, I hope to get one HTML template done, preferably the training template e.g. the page that shows our instructional video and real-time video feedback, and finish our design report since that is a major part of our grade.

Aishwarya’s Status Report for 2/26/22

This week, I presented our intended design during the in-class design presentations. We received feedback on the feasibility for our ML models with respect to feature extraction data from MediaPipe being compatible with our models as inputs to LSTM nodes. I reviewed this component of our design during this past week in order to provide adequate justification for the necessity of LSTM cells in our network (in order to support temporal information as part of our model learning), as well as its feasibility (outlining what the input and output data formatting/dimensionality is expected to be at each layer, as well as researching examples of MediaPipe data being used with LSTMs). I also worked more on our code for data formatting (converting landmark data from MediaPipe into numpy arrays that can be fed to our models). I now just need to add resampling from video data to grab the necessary number of frames. We received AWS credits towards the end of this past week, so we have not been able to work much on feature extraction and model training within an EC2 instance. Although, our schedule still indicates we have time for model training, I am a little concerned that we are slightly behind schedule on this front.

In order to catch up, we will hopefully be able to spend more time on implementation once the design report is completed. So far, a good amount of our time has gone towards presentations and documenting vs implementation. Once these deliverables are met, I hope to be able to shift my attention more towards building up our solution.

Team Status Report for 2/26/22

This week we had the Design Presentation and began working on our Design Report. We received feedback from the presentation mainly regarding our design choice to use an LSTM with MediaPipe, where our advisor was a little wary about how well this would work. After discussing it as a group and doing some more research, we are confident that our choice will fit the use-case.

Currently, the most significant risks that could jeopardize the success of our project are semester time constraints. Given that the design review and midterms have taken a lot of time over the past few weeks, and that spring break is coming up, we have not had a lot of time to work on our actual implementation. This is especially concerning given the amount of time it generally takes to train an ML model and the amount of data we need to both create and process. To manage this, we will prioritize training the model based on our neural network groupings, where the networks with less signs will hopefully be quicker to train. Additionally, we will have more frequent group meetings and internal deadlines, so that we can meet all the milestones in the remaining time we have.  As for contingency plans, if training the model takes too long we will cut down the number of signs we are including in the platform for quicker training while still maintaining the usefulness of the signs provided to the users.

In terms of changes to the existing design, we realized that utilizing hand landmarks and face landmarks presented some compatibility problems and too much complexity given our current expertise and remaining time. Thus, we removed all signs that involved contact with the face/head and replaced them with other signs (that still involve motion). Because this change was made during our design phase, there are no real costs associated with this change as our chosen signs are still in our chosen datasets and maintain the same level of communicativeness for the user.

Our schedule is mostly the same as before but we plan to make testing data for the model in weeks after spring break and also internally, we plan to devote more effort to training the model.

Hinna’s Status Report for 2/19/22

This past week, my focus was mostly on the components of our project related to the design presentation and report.

For my individual accomplishments, I first created some testing data for our machine model for 15 of our 51 signs, with 5 versions of each of the 15 signs. In these different versions, I varied the lighting and angle at which I signed to allow for more robust testing when we begin testing our model. I also researched the benefits of an RNN (our chosen neural network) versus a CNN (our contingency plan) to help my team make a more informed choice on how to structure our solution.

Additionally, I finalized the microarchitecture of our different neural networks, meaning I figured out how to sort our 51 signs into different models based on similarity. The purpose of this sorting is to ensure that when users sign on the platform, our models will be trained against similar signs in order to more definitively decide if the user is correctly signing one of our terms. The 5 different neural networks are roughly sorted by fist signs, 1 finger signs, 2 finger signs, 3 finger signs, and 4-5 finger / open handed signs. Note that these neural networks will still have similar structures (RNN, hyperbolic tanh activation, same number of layers, etc) but will differ in the signs they are trained to detect.

Our project is currently on schedule, as long as we are able to start training the model in the next week or so. Now that we have a more definitive system design our timeline seems more attainable than last week (when we weren’t sure which neural network to use to implement our solution).

As for deliverables next week, I plan to do more iterations of the 15 assigned signs I have to contribute to our testing data. I also plan to work with my team on our design report and begin training our model.