This week we spent getting everything ready for the demo and doing e2e tests to make sure everything is setup and ready. We got everything to work for the demo where the user can talk to the assistant to have a conversation, set/stop an alarm, or play/stop playing a song. We also got a VM with a GPU setup where we are doing model inference for quicker performance (for the demo we will be using llama3.2 with 8b params). We also made some updates to our UI for a better user experience. This week we did lots of integrations and putting everything together for the first time and are very pleased with our results.
We don’t foresee any big risks or challenges up ahead as we were able to overcome our biggest challenges which were getting the mic and speaker to work and to integrate all parts of the system together. One small issue that we may see is that we had set our latency requirements to be under 5 seconds, but lots of our current tests are a little bit over this. We are looking into changing text to speech models to a smaller one as it is a big bottleneck in our system right now and are also looking into streaming our responses from the model instead of waiting for the response to be fully complete which might give us some extra performance.
No changes were made to the existing design of the system this week.