This week our team focused on researching and experimenting ML models and web packages that are suitable for individual modules of our project, including the web app, backend speech recognition model and language detection model. After our research, we will include the models that show promising results and a proof of concept web app demo in our design document that is due 2/20/2022.
We started developing a web app that will become a proof of concept. This web app will support users to record their voice and submit the audio data to the server. The server will feed the audio into a speech recognition model and return the model output back to the frontend. This week we have finished implementing the voice recording on the frontend and audio data transfer to the server. The foreseeable risks on the web app development so far includes the loss of quality of audio transferred from frontend to server and the speed of our backend model processing the audio. Although we have finished implementing an audio transfer module that can successfully transfer audio data recorded on a web page to our server, the audio file generated on the server is noisy, which will impede our speech recognition accuracy. This issue should be fixed after finding a way to retrieve the original audio’s frame rate and number of channels so that these metrics can be used when writing the audio data into a .wav file on the server. So far, we are confident that this issue can be fixed by 2/20/2022.
We will also try running a speech recognition model on our development server to estimate the speed of our system. If the speed of processing is too slow, we will try switching to a more powerful server instance type.
We also made the necessary resource requests for increasing our GPU instance limits as well as for AWS credits. We set up our respective development environments (CoLab or Jupyterlab) for remotely developing our software on the AWS GPU instances. Next steps will include uploading our desired development data sets onto our respective storage servers and curating our first smaller development sets for training the ASR and LID models. By next week we hope to have subtasks specifically divided in terms of model development and early models running for each sub-model of the whole DL system. We’ll also aim to have the detailed architecture of several iterations of the system finalized for the upcoming design presentation and document. The major risks we have in this area are those of our model’s performance and training time. We can’t be sure of exactly how quickly to expect progress or what level of performance to expect by integration and deployment time without beginning to characterize these metrics by beginning development. Taking as many early steps now to begin development will help address these risks and also help us understand exactly how large our vocabulary and dataset needs to be to achieve the user experience we are seeking.