This week I primarily focused on the final presentation and making the final poster for the demonstration next week. In addition, I did some fine-tuning on the LID model. One problem that I saw was that the system sometimes switches languages really quickly to the point where the segment is shorter than the minimum input for the ASR model. The short segments are often inaccurate since they are shorter than an average normal utterance of 0.3 seconds. To address this issue, I set a minimum threshold for each segment to be at least 0.3 seconds long and merge the short segments with nearby long segments. This approach improved CER by about 1%.
For next week I plan on fixing some minor issues with the system, especially with the silence detection. Currently, the silence detection is using a constant decibel as the threshold, but this could be problematic in a noisy environment where the average decibel is higher. Finally, I will prepare the system for the final demonstration next week.