This week I mainly focused on improving the performance of the model. Following the approach outlined by a research paper, I started with a wav2vec2-large-xlsr-53 model that was pretrained on several Mandarin datasets and fine-tuned the model on a Mandarin-English code switching dataset called ASCEND. The model achieved a CER of 24%, which is very close to the 23% CER reported in the research paper. Upon closer inspection, I noticed that the model is good at recognizing when the speaker switches language. Also, the model performed extremely well on Mandarin inputs, but is lacking in the accuracy of English inputs. This is most likely due to the fact that the model was initially pretrained using Mandarin.
For next week, I plan on improving the model’s performance on English inputs through two approaches. The first approach will be to add more datapoints purely in English into the existing dataset. The second approach will be to train the model on SEAME, a much larger and comprehensive code switching dataset.