This week I was able to accomplish a lot of what I wanted to in the direction of trying one last time to implement and train a complete jointly-trained LID/ASR model. I was able to overcome the problems I was having previously with creating logically correct labels for English and Mandarin audio segments which would be needed to train the LID model. I did this by using my previously top performing combined model (which used a shaky heuristic technique for creating labels) and used it to segment audio during preprocessing and label it English, Mandarin, or Blank (essentially distilling the LID information from that model). These new labels were then used in training a fresh model with the hope that the ASR-half of the model would no longer need to hold any LID information at all. Training has so far been successful, though I have not yet broken through my previous best performance. The training procedure does take stages to complete as I decay certain hyper parameters over time and I anticipate that over the next 10-20 epochs, I should be able to reach my goal. I’m on track with our planned schedule and will also be working heavily on our final report this week as this model trains in tandem.
 
  
  
 
