Hinna’s Status Report for 4/30/22
This week, I finished working on the final presentation with my group, where I added some walkthroughs of the webapp (attached to this post). I also started working on the final poster where I wrote out our product pitch, some of the system description, and worked with my group members to do the rest. Additionally, I began writing the final report, making adjustments to the introduction, user requirements, and project management based on changes we made since the design review.
Our project is on schedule as of now, where we mostly need to finish working on solution tradeoffs, user testing, and final deliverables. Over the next week, I will work with my team to finish user testing, the final poster, the final video, and finalize plans for the final demo. I will also work on analyzing some of our tradeoffs (i.e. epochs vs accuracy for each of the models).
Hinna’s Status Report for 4/23/22
Over this past week, I created tradeoff graphs based on metrics we found for model accuracy, which we graphed against the number of epochs used in training and testing respectively. In these graphs, we identified that the dynamic models are performing very well (93%+ accurate) which most likely has to do with the fact that we had to create our own training data for them. On the other hand, the 1-finger and open-hand models were performing pretty poorly (60-70% accurate). So, along with my teammates, I made more training data for those models to see if adding that would help improve their accuracy.
Additionally, as the dynamic models are now integrated into the webapp, I examined how well they were doing, testing them personally at various angles, distances (within 2 feet), and using both hands in order to see how accurate they were. I found that when doing the signs quickly, within one second, the prediction was not accurate but when doing it more slowly, the accuracy improved. This finding was also reflected in some of our user test results where we had 2 users test the platform on Friday.
Finally, I have been working with my teammates on the final presentation, where I have updated our schedule and project management tasks, altered our Solution Approach diagram to account for the number of neural networks we have, adjusted our user requirements based on changes made since the design presentation (i.e. our distance requirement lowered and our model accuracy requirement increased), adjusted the testing/verification charts, and finally included the tradeoff curves for testing & training accuracy vs the number of epochs.
Our project overall seems to be on schedule with a few caveats. One is that we are head of schedule in terms of integration as we finished that last week, so our initial plan of integrating until the very end of the semester is no longer the case. However, our model accuracy is not quite where it needs to be for every subset of signs, so given that we only have about a week left, the fact that we might not be able to get them all to our desired accuracy of 97% makes it feel like we are a little behind. Additionally, we held user tests this past week and only 2 users signed up (our total goal is 10 users), which means our testing is behind schedule.
As for next week, my main focuses will be getting more user tests done, finalizing the tradeoff curves in the case where our model accuracies are improved through the addition of more training data, and working on the final report, demo, and video.
Team Status Report for 4/23/22
The most significant risks that could jeopardize the success of our project currently are the model accuracies. Over the past week, we have been looking at accuracy tradeoffs and started conducting user tests, and with the dynamic models, we can see that when users sign them quickly, our model is not able to accurately detect the dynamic signs. To fix this, we are considering making training data of doing the signs faster so that the model can be trained on faster iterations of each sign. As a contingency plan, we will just tell the user to sign slightly slower or just keep the models as they are since we are nearing the end of the semester.
There haven’t been any changes to our system design. As for our schedule, we are going to extend our user testing weeks all the way up to the demo since we were not able to get enough users to sign up over this past week. Additionally, we plan to collect survey results at the live demo to get more user feedback to add to the final report. Also, because we have the webapp and ML models fully integrated, we are shortening the integration task on our schedule by two weeks.
Hinna’s Status Report for 4/16/22
This week, my main focus was user testing and also examining the accuracy of our static and dynamic models.
In regard to user testing, I made a google form survey that asks our testers to rate different usability features of the website as well as how helpful they felt it was in teaching them ASL. I also made a step by step guide for users to follow when we conduct the testing, which we will use to see how intuitive it is for users to complete the steps and to make sure they test various different actions (i.e. making sure each user tries to correctly and incorrectly sign on the platform to see the results). Finally, as a TA for the ASL StuCo this semester, I reached out to students who are either semi-experienced or experienced in ASL to conduct our tests. We will also be reaching out to a few people who are brand new to ASL in order to get a beginner’s perspective on our platform.
As for the models, I have been trying different combinations of training epochs and prediction threshold values (where the model only outputs a prediction if it is over a certain number i.e. 90%) to determine the best weights for the model to make it more accurate. In these tests, I have been able to identify certain signs that consistently have trouble over the combinations as well as some environmental factors like contrasting backgrounds that can influence the results. Because of this work and feedback during our weekly meeting, I will continue trying these combinations in a more intentional way, where I will record data related to the accuracy of the models based on epochs and/or threshold values in order to graph the tradeoffs associated with our system. The final accuracy data collection and graphs themselves will be recorded at the end of next week in order to account for any training shifts we make this week based on some of the identified signs with consistently inaccurate predictions.
Our project is on schedule at this point, however our model accuracy is not quite at the 97% we set out for it to be at the beginning of the semester. Since we planned to be adjusting and tuning the model up to the very end of the semester, this is not too big of a deal but we are going to start shifting focus primarily to testing as well as the final presentation/demo/report. Thus, while we are on schedule, our final implementation may not be as robust as we had planned for it to be.
Next week, I will be conducting user tests along with my teammates, focusing on factors such as hand dominance, hand size, lighting, distance from camera, and potentially contrasting backgrounds.I will also be examining the dynamic models more in depth to identify signs that are having less successful detections. Additionally, I will be recording information on accuracy vs threshold value and accuracy vs epochs used in training, then using that information to make tradeoff curves that we can hopefully include in our final presentation.
Team Status Report for 4/16/22
The most significant risks that would jeopardize the success of our project are the accuracy of the static model predictions, the accuracy of the dynamic models (which is a post MVP concern), and minor bugs such as our start-stop recording not timing out correctly.
In regard to the static model accuracy, we are managing this risk by examining each model to determine which signs are having issues and playing with different epoch/prediction threshold values to see if different combinations improve the model accuracy. Given that there are only 2 weeks left in the semester, if the accuracy is not improved we will simply mention that in the report but will keep the models as they are in the platform since they are overall performing well.
As for the dynamic models, this a post MVP portion of our project, and currently a lot of the 2-handed moving signs are not being accurately detected. To mange this risk, based on feedback from Tamal and Isabel, we are checking for the number of hands present in frame for 2-handed tests, and immediately saying that its wrong if there is only 1 hand present. Additionally, we are looking into the flickering of media pipe landmarks which occurs when the hands blur in motion in the middle of doing the signs. We are thinking of padding or removing those blurred frames. Again, as a contingency plan, we will most likely keep the dynamic model in our platform as is if we can’t improve it, given that there are only 2 weeks left, and address the inaccuracies in our report.
In regard to the minor bugs, like the start-stop timeout, we are using logs and online forums to try to figure out why our recording is not actually stopping. If the issue persists, we will reach out on Slack to get some help with the error. It should be noted that this issue is not a major problem as the main purpose of our platform (having users sign and get predictions on whether their signing is correct) is not affected by it. However, if the issue is not able to be fixed, we will simply instruct the users on how to get around it (i.e. to repeat execution they have to refresh the page instead of being able to press start again).
There have not been changes to our system design or schedule.
Hinna’s Status Report for 4/10/22
Throughout this week, I was feeling fairly sick, so I did not accomplish as much as I intended to. However, I was able to continue working on creating testing/training data (specifically focusing on our 15 dynamic signs) as well as continue working locally with our machine learning models to see which groupings of signs are performing the worst (currently this is the fist model). Additionally, we did a virtual dry run through of our interim demo, which we couldn’t do since our whole group was sick, and based on TA/professor feedback, I started looking into normalizing the inputs we get to our model in order to make sure that slightly varying the distance or hand position doesn’t affect the predictions. I will continue this research along with my other group members this week to determine if the normalization will be feasible given the amount of time we have left.
Additionally, given that there are only 3 weeks left in the semester, I began planning for user testing by researching the best webapp user satisfaction questions to ask, reaching out to potential testers (other students at CMU), and creating a testing schedule. Given that we initially planned to start user testing last week, this portion of the project is behind schedule.
Overall, we are slightly behind schedule due to our group-wide illnesses this past week and the fact that our models are not working for every letter, resulting in us pushing back user testing. In order to get back on track, we have scheduled two additional meetings this week (outside our normal class time and previously set weekly meets), so that we can dedicate more focused work time to our project.
Next week, we hope to have the static models fully done and the dynamic models significantly progressing (with at least 7/15 having 80%+ prediction rates). Additionally, we plan to start implementing a quiz/testing menu option on the webapp, where users can be quizzed on a random subset of signs. Finally, we plan to have a user testing schedule planned out and ready to go so we can get some feedback in these final weeks.
Team Status Report for 4/10/22
This week the most significant risk is processing the data for the dynamic models because it will take some time to format all the data correctly and see how well the model does. To manage this, we are dedicating more time to working with the dynamic data and making it our primary focus for this week. As for contingency plans, if we continue having issues, given that we only have 3 weeks left and that static signs were our MVP, we will most likely leave out the testing portion for dynamic signs, and only include them with educational materials in our platform.
A secondary risk is that the fist model (including signs like a, m, n, s, t, etc) is not performing as expected, where it is not correctly distinguishing between the signs. To manage this, we will be investigating the training data for the fist model this week to figure out why it is not performing well, currently we think the issue is due to varying hand positions but we will confirm after looking into it more this week. As for contingency plans, if we are unable to figure out the fist model issue, we will separate the fist signs into the other models, so that the model will only have to distinguish between dissimilar signs.
Our design has not changed over the past week nor has our schedule seen any significant changes.
Hinna’s Status Report for 4/2/22
This week, I personally worked on my portions of creating the ASL testing/training data, adjusting the instructional material (videos and text) to make sure it fits with how our model is reading hand data (i.e. making sure hands are angled so that all/most fingers are visible in frame), and locally examining the web app + ML models.
In regard to examining the web app, I have been brainstorming with Valeria on some of the suggestions that were given to us on our previous week meeting, where we are trying to decide the best way to store user statistics, include a random quiz mode for users to use with the ability for them to select which categories to be tested on, and format/display user profile information. As for the machine learning models, I have been locally working with them to see which is performing the best (currently the 1 finger model is) and to try to determine holes in how they have been trained (i.e. the sign for ‘d’ requires a certain tilt for it to be accurately detected). After identifying some of these signs, I have been working with Aishwarya to figure out solutions for theses issues in the models.
Our schedule is technically on schedule but is in danger of being behind, especially because my other two group members tested positive for Covid this past week. To account for this, we are pushing some tasks back a week on our schedule (such as integration) and doing our best to collaborate virtually.
In the next week, we plan to successfully execute an interim demo with most if not all of the static sign models working, along with a webapp that can teach users the ASL terms we have chosen after they make an account on the platform.
Team Status Report for 4/2/22
The most significant risks that could currently jeopardize the success of the project is the model accuracy. At the moment, our model is very sensitive to slight hand tilts and little nuances in signs, so even when making a technically correct sign, the model is unable to identify it as correct. To manage this risk, we are planning to alter some of the layers of our model, the epochs used to train, as well as the number of nodes to see if these adjustments result in a more robust detection. Additionally, in the next week or so, we plan to consult with Professor Gormley on our model to see if he has any recommendations for improving the detection. As for contingency plans, if we are unable to make the model more flexible in its predictions, we will adjust our instructional materials to better reflect the training data, so that users sign in a way that is seen as correct by the model.
There have not been any changes to the design but after meeting with our advisor and TA we are thinking of adding some features to the webapp such as tracking user statistics. This change will mainly be involved with the user model that we currently have, with an extra field in their profile for letters that they frequently get wrong. We are making this change to make the learning experience more personalized for users, where our platform can reinforce signs/terms that they consistently get incorrect through additional tests. Note that such changes will not be made a priority until after the interim demo, and more specifically, after we have addressed all the feedback we get from the demo.
Our schedule mostly remains the same, however, two of our three group members are currently sick with COVID so we may have a much slower week this week in terms of progress. As a result, we may have to adjust our schedule and push some tasks to later weeks.