This week we’ve been working diligently to improve our model’s performance and work on improving our projects overall insights and user experience. Within a day of our meeting earlier this week, I spent time retraining a fresh model with some slight modifications to which pre-trained feature extractor we were using as well as training hyper-parameters. This lead to a model with much better performance compared with what we were able to demo at our interim in both the character and word error rate metrics across all evaluation subsets and overall.
This is continued progress now being made in completing the training of a model with a slightly more nuanced architecture using both CTC and classification loss fused and with fused logits as well. Overall, the larger model should generally have the capability for greater knowledge retention and a finer ability to distinguish between mandarin and English by more explicitly focusing on separating the task of LID and ASR. We will continue work tomorrow to strategize about what we would like the final outcome of our project to be and what we would like to highlight during our final demo and report.