I integrated the model execution with the web app (such that the user’s input is parsed and passed to the model for generating a prediction). I also parsed all of the new data (that we collected in order to replace incorrect signs in the original training dataset we were using), by extracting image frames from a series of videos our group made, and then extracting landmarks from each image frame. I retrained all the models with this newly formatted data.
My progress is mildly hindered due to having covid this past week, so I haven’t been able to tune the models as much as I would like to. The models in general have slight trouble identifying unknown signs. The fist sign category model in particular seems to have the most difficulty identifying letters such as A and S. I hope that after recovering this next week, I can tune the models further in order to deal with these issues. I will have to experiment with the number of training epochs, and the model structure itself (increasing/decreasing the number of layers and nodes within each layer).
Next week, I hope to fix some of these prediction issues currently observed with the models. I also want to work on making the web app more smoothly integrated with the model execution service. Currently it requires downloading the video input from a user locally, but it would be better to cut out this middle step to improve latency.