I completed the code to parse the data for images and videos, passing them through MediaPipe and extracting and formatted the landmark coordinate data. The rough table below shows my initial findings for training and testing accuracy using a dataset for letters D,I,L, and X with 30 images per letter class. Over varying parameters to see how this affected the testing accuracy, the best test accuracy I could achieve was 80.56%. Overall, this seems to be an issue with overfitting (expecially since this initial data set is small).
Another dataset was found with 3000 images per letter class (though many of these fail to have landmark data extracted by MediaPipe). With using this dataset, overfitting still seemed to be an issue, though the model seems to perform well when testing in real time (I made signs in front of my web camera and found it to identify them pretty accurately). During this realtime evaluation, I found that it worked for my left hand. This means I will need to mirror the images the correct way to train the models for each right-handed and left-handed signs.
My progress is on schedule. To combat issues with overfitting during the next week, I will continue trying to train with a larger dataset, varying parameters, and modifying the model structure. By the end of next week, I hope to have the models trained for each ASL grouping.