This week, my team and I finished creating the slides for the design review presentation, and my teammate presented it in class. Along with that, I’ve been working on the training of the ML model beyond the basic alphabet training. I gathered two datasets, how2sign and DSL-10, which both contain a multitude of videos of users signing common words and phrases. For now, I am just working with the DSL-10 dataset, containing signage of ten common daily vocabularies. From this dataset, I took 12 input samples from 3 different dynamic phrases, and trained a small model from this data. I did this by first loading and preprocessing the data, and randomly splitting into testing and training sets. Then, I extracted features from MediaPipe’s hand and pose objects. I created an array of these landmarks and performed any necessary padding if the amount of landmarks from the pose, right hand, or left hand weren’t the same. Next, I created a simple sequential model with two LSTM layers followed by a dense layer. With 10 epochs of training, this created a model that I will further expand.
My progress is on schedule, but I am a little worried about the next steps taking longer than I expect. Since this dataset only contains 10 phrases, it will be necessary to further train additional datasets to recognize more phrases. Since datasets might be of different formats, one other consideration I will have to make is how to ensure compatibility in the training process.
Next week, I hope to continue adding input samples from the DSL-10 dataset until I have significant data for all 10 phrases. I would also like to test this model iteratively to see if there are any additional necessary steps, such as additional feature extraction or preprocessing of frames. To do this, I will determine the accuracy of this trained model on my teammate’s computer vision processing by testing out signs. This will progress the process of training the model. Additionally, my team and I will be working on the Design Review Report that is due on Friday.