This week, I finalized the fix for the gesture recording issue we identified earlier. I re-checked the dataset and re-identified the top six gestures with the top success rates. I modified the CNN model architecture accordingly.
Besides, I scaled up the dataset to twice the size, making the model even more robust. Consequently, the wand can correctly identify all the six target gestures with much-enhanced performance.
I remain on schedule as per the project timeline.
When developing, I tried out quite a number of model architectures with the aim of determining the one that best fits the task of classifying gestures based on the IMU. I started with a simple RNN, which could capture basic motion patterns but showed poor generalization. I then experimented with an LSTM architecture, which improved on training accuracy but overfitted quickly with our small dataset and required long training times. To balance spatial and temporal modeling, I chose Conv1D + LSTM, but it did not produce better accuracy. Lastly, I tested Conv1D-based CNN with two small-kernel convolutional layers, flatten, and dropout layers. This model had the highest validation accuracy throughout and was within the size limit.
I learned new skills in time-series modeling, model optimization, and data augmentation. I watched YouTube tutorials to learn key modeling techniques. I referred to TensorFlow documentation for implementing and tuning. Additionally, I read blog posts and Stack Overflow discussions to troubleshoot overfitting and understand best practices for dropout and kernel sizing. These helped me to quickly iterate and effectively tailor the model to meet our system constraints.