LID work this week focused on continued training and fleshing out interfaces to interact with the model. Pre-loading and forward pass metrics were introduced to expose the functionality of the model through an importable class available from the LID’s GitHub. The model itself is loaded from a separate online repository (also based through GitHub) which is where improved version of the model have been automatically loaded as training has progressed. Focusing on integration and development of the first demo will take up most of the work for the next couple days along with beginning to build out the software suite for performing the various tests we’d prescribed in our design and architecture documents. The model could be about a half week further ahead so Nick plans on spending most of the next week focusing solely on these deliverables.
On the web app end, we have integrated a codeswitching model trained by Marco and got some promising result. The model is able to run efficiently when we chunk ongoing recording stream to 1-second chunks to feed to the model, the model could output the transcription in close to 1 second, which achieves a real-time experience of our app. The model is able to accurately capture the instance of language switching within a sentence, but since we are only feeding a 1-second audio chunk at a time to the model, the model is only able to give the best transcription based on the audio feature within that 1-second chunk. So far the integration is on schedule. We are ready to start evaluating our models using diverse audio samples from Youtube and tune our models accordingly. We will also incorporate Nick’s LID model to enhance our model accuracy and experiement with other chunking mechanism to encapsulate more context in an audio chunk while keeping the chunk short.