ml – Page 3 – Team E5: ASL Learning Platform

February 26, 2022February 26, 2022

Hinna’s Status Report for 2/26/2022

This week, I worked with my team on the design review, with the main deliverables being the presentation and the report. I personally worked on creating 15 more iterations of testing data, with the 15 communicative and dynamic signs that I was assigned. I also helped create diagrams for the design presentation, specifically with the microservice architecture we used to describe our neural network separation.

Currently our project is on schedule but we definitely feel a little bit of the time pressure. We have not yet begun training our ML model because we only finalized our neural network type during the design review this week. Additionally, all of us are very busy with midterms and writing the design report so we haven’t done as much work as we wanted to on the project implementation itself. To account for this, we plan to meet more frequently as a team and extend some tasks past spring break in our schedule (such as creating testing data).

Next week, I hope to work with my team to complete the design report where I am primarily responsible for the introduction, use-case requirements, testing, and Project Management sections of the report.

February 26, 2022

Aishwarya’s Status Report for 2/26/22

This week, I presented our intended design during the in-class design presentations. We received feedback on the feasibility for our ML models with respect to feature extraction data from MediaPipe being compatible with our models as inputs to LSTM nodes. I reviewed this component of our design during this past week in order to provide adequate justification for the necessity of LSTM cells in our network (in order to support temporal information as part of our model learning), as well as its feasibility (outlining what the input and output data formatting/dimensionality is expected to be at each layer, as well as researching examples of MediaPipe data being used with LSTMs). I also worked more on our code for data formatting (converting landmark data from MediaPipe into numpy arrays that can be fed to our models). I now just need to add resampling from video data to grab the necessary number of frames. We received AWS credits towards the end of this past week, so we have not been able to work much on feature extraction and model training within an EC2 instance. Although, our schedule still indicates we have time for model training, I am a little concerned that we are slightly behind schedule on this front.

In order to catch up, we will hopefully be able to spend more time on implementation once the design report is completed. So far, a good amount of our time has gone towards presentations and documenting vs implementation. Once these deliverables are met, I hope to be able to shift my attention more towards building up our solution.

February 26, 2022February 26, 2022

Team Status Report for 2/26/22

This week we had the Design Presentation and began working on our Design Report. We received feedback from the presentation mainly regarding our design choice to use an LSTM with MediaPipe, where our advisor was a little wary about how well this would work. After discussing it as a group and doing some more research, we are confident that our choice will fit the use-case.

Currently, the most significant risks that could jeopardize the success of our project are semester time constraints. Given that the design review and midterms have taken a lot of time over the past few weeks, and that spring break is coming up, we have not had a lot of time to work on our actual implementation. This is especially concerning given the amount of time it generally takes to train an ML model and the amount of data we need to both create and process. To manage this, we will prioritize training the model based on our neural network groupings, where the networks with less signs will hopefully be quicker to train. Additionally, we will have more frequent group meetings and internal deadlines, so that we can meet all the milestones in the remaining time we have. As for contingency plans, if training the model takes too long we will cut down the number of signs we are including in the platform for quicker training while still maintaining the usefulness of the signs provided to the users.

In terms of changes to the existing design, we realized that utilizing hand landmarks and face landmarks presented some compatibility problems and too much complexity given our current expertise and remaining time. Thus, we removed all signs that involved contact with the face/head and replaced them with other signs (that still involve motion). Because this change was made during our design phase, there are no real costs associated with this change as our chosen signs are still in our chosen datasets and maintain the same level of communicativeness for the user.

Our schedule is mostly the same as before but we plan to make testing data for the model in weeks after spring break and also internally, we plan to devote more effort to training the model.

February 20, 2022February 20, 2022

Hinna’s Status Report for 2/19/22

This past week, my focus was mostly on the components of our project related to the design presentation and report.

For my individual accomplishments, I first created some testing data for our machine model for 15 of our 51 signs, with 5 versions of each of the 15 signs. In these different versions, I varied the lighting and angle at which I signed to allow for more robust testing when we begin testing our model. I also researched the benefits of an RNN (our chosen neural network) versus a CNN (our contingency plan) to help my team make a more informed choice on how to structure our solution.

Additionally, I finalized the microarchitecture of our different neural networks, meaning I figured out how to sort our 51 signs into different models based on similarity. The purpose of this sorting is to ensure that when users sign on the platform, our models will be trained against similar signs in order to more definitively decide if the user is correctly signing one of our terms. The 5 different neural networks are roughly sorted by fist signs, 1 finger signs, 2 finger signs, 3 finger signs, and 4-5 finger / open handed signs. Note that these neural networks will still have similar structures (RNN, hyperbolic tanh activation, same number of layers, etc) but will differ in the signs they are trained to detect.

Our project is currently on schedule, as long as we are able to start training the model in the next week or so. Now that we have a more definitive system design our timeline seems more attainable than last week (when we weren’t sure which neural network to use to implement our solution).

As for deliverables next week, I plan to do more iterations of the 15 assigned signs I have to contribute to our testing data. I also plan to work with my team on our design report and begin training our model.

February 19, 2022February 19, 2022

Team Status Report for 2/19/22

This week, our team finalized the type of neural network we want to use for generating ASL predictions. We gathered more research about tools to help us with model training (e.g. training in an EC2 instance) and planned out the website UI more. We worked on creating our database of ASL test data, and worked on the design report.

The most significant risks right now are if our RNN does not meet requirements for prediction accuracy and execution time. In addition, the RNN will require a large amount of time and data for training. If we increase the number of layers or neurons in an effort to improve prediction accuracy, this could increase training time. Another risk is doing feature exraction without enough efficiency. This is critical because we have a large amount of data to format so that it can be fed into the neural network.

To manage these risks, we have come up with a contigency plan to use a CNN (which can be fed frames directly). For now, we are not using a CNN because it’s performance may be much slower than that of an RNN. For feature exraction, we’re considering doing this in an EC2, so that our personal computer resources are not overwhelmed.

A design change we made was the groupings for our signs (to have separate RNNs for each). Before, we grouped simply by category (number, letter, etc.), but now, we are grouping by similarity. This will allow us to more effectively distinguish if the user is doing a sign correctly, and detect minute details that may affect this correctness.

There have been no changes to our schedule thus far.

February 19, 2022February 19, 2022

Aishwarya’s Status Report for 2/19/22

This week, I worked with Tensorflow to gain familiarity with how it allows us to instantiate a model, add layers to it, and train the model. I also experimented with how we would need to format the data using numpy so that it can be fed into the model. Feeding dummy data in the form of numpy arrays to the model, I generated a timing report to see how long the model would take to generate a prediction for every 10 frames processed in mediapipe (during real-time video processing), so that we could get an idea of how the model’s structure impacted execution time.

Our team is also working on creating a test data set of ASL video/image data, so I recorded 5 videos for each of the signs for numbers 0-4 and a-m and uploaded them to a git repo we are storing them in.

The exact network structure that would optimize accuracy and execution time still needs to be determined, but this must be done through some trial and error. We will be using at least one LSTM layer, followed by a dense layer, but knowing the exact number of hidden layers and their number of neurons will be more clear after we have the chance to measure performance of the initial model structure and optimize from there.

Our progress is on schedule. Next week, I hope to complete the feature extraction code with my partners (both for real-time video feed and for processing our training data acquired from external sources).

February 12, 2022February 12, 2022

Team Status Report for 2/12/22

This week, our team gave the proposal presentation. We met twice before the presentation to practice what we were going to say. As part of our preparation for the proposal presentation, we also created a solution block diagram to visualize the main components needed for our project. Furthermore, we created a visualization of the different modes for our web application (training vs testing). After our proposal presentation, we met on Friday to discuss how we were going to design our machine learning model. We were researching what were the best types of neural networks to use for both images and videos to label them with a correct prediction. We discussed the limitations of convolutional networks and looked more into recurrent neural networks. We also discussed how we might want to approach feature extraction (modifying the coordinate points from the hands into a more useful set of distance data). Distance data may allow us to have greater prediction accuracy than raw image inputs, which can have interference from background pixels. Currently, our most significant risk is incorrectly choosing the neural network, as well as having our models not be accurate enough for users. Another potential risk is incorrectly processing our images during feature extraction leading to latency and incorrect predictions. Our current risk mitigation is that we are researching the best neural network. But we have decided that worst-case scenario we would choose convolutional neural networks, which would allow us to simply feed the images themselves as inputs with the consequences, however, of lower accuracy and more latency. Lastly, a potential worry is that we need to start training soon but our design is still in progress, so we have firm time constraints to keep in mind.

February 5, 2022February 12, 2022

Valeria’s Status Report for 2/12/22

This week I made the base for our web application. I set up a Github repository for the team to access our application where all the Django files are located, as well as a separate folder for our HTML templates. I did research on what is the best camera to use for our computer vision and which one would help with our machine learning model. From what I found, the best one that is within our budget is the Logitech C920 camera since it has 30 fps. The 30 fps is going to help us when we are creating our neural network for our moving signs in our platform. Apart from that, I have also been researching neural networks, like the rest of our team, and trying to decide which one would be best for our project. From what I am currently finding, it seems that using two different neural networks, one for moving and one for static signs, can help us in the long run.

Our project is currently on schedule. For next week, I hope to finish my research on neural networks and finalize our design for the machine learning part of our project. Furthermore, I hope to get the HTML templates set, with a very basic layout of what the app is going to look like, and also start listing the functionality that we might need from AJAX. Apart from that, I would also be helping finish up our design presentation and paper so I do want to finish the design presentation slides by next week.