Kemdi Emegwa’s Status Report for 4/12

This week was mainly spent hardening our system and ironing out kinks and last minute problems. As mentioned in previous reports, we were encountering a dilemma where the text to speech model we were using was sub par, but the better one was a bit too slow for our use case. To combat this we introduced streaming into out architecture.

There were two main areas were were streaming needed to be introduced first was for the text to speech model itself. This would improve the time to first token output because rather than synthesizing the whole text  we can synthesize in chunks and then output them. This already dramatically improved performance and allowed us to use the higher quality model. However this did not address the fact that if the model hosted on the cloud returned a very long response we would still have to wait for the entire thing before running the text-to-speech model.

In order to address this fault, we decided to stream the entire query architecture. This involved work on both the server side and the device side. We also had to change how the model returned its response to accommodate for this. However, immediately sending each chunk to the tts model to be outputted resulted in weird and choppy output. To rectify this, I made it so that the device buffers chunks until it sees a “.” or “,”, then it sends type buffered chunks to the tts model. This made it sound significantly more natural.

For this next week, I will mainly spend my time cleaning up code/error handling and also working with Justin to introduce TLS so we can query over https rather than http. I think we are definitely on track and I don’t for see us encountering any problems.

Team Status Report for 4/12

This week we have been working on streaming for both server and board models. We have been able to stream responses end to end however they are pretty choppy. We will try to fix it next week by increasing the amount of words in each send, so that the speech to text model has a little more to work with.

We currently do not have any big risks as our project is mostly integrated. We have not changed any part of the design and are on schedule. We plan to do the testing and verification soon as we have finally finished the parts that deal with latency as we obviously couldn’t have done any useful testing until that was finished.

David’s Status Report for 4/12

This week I worked on two main things: server side streaming and 3d printing. For server side streaming, we were worried about end to end latency as current testing seemed to be around 8 seconds. To eliminate this, we decided to implement streaming for both the server/model and the on board speech/text models. I did the server side and modified the code to allow the server to return words as they are generated which massively increases the speed. This makes the speech a little choppy due to it receiving one word at a time, so next week I might try to change the granularity to maybe 5-10 words. I also have done some work on improving the 3d model. The first iteration came out pretty well, but I want to add some more components to make it complete like a sliding bottom and a sliding top to prevent the speaker from falling out if moved upside down. I am hoping this is the final iteration, but if it doesn’t come out according to plan then I might have to do one more print. I am on schedule, and along with the aforementioned things I will also try to do some testing/verification for next week.

Justin Ankrom’s Status Report for 4/12

This week I worked on 2 main things: making all the instructions to download all the different models and making the setup guide for the VM. I first had to make all the different docker images for both the flask app and ollama server for the different models. Since we chose 4 models, this was a total of 8 docker images I had to build and upload to docker hub. I then went ahead and replaced all the model placeholder values on our website with these models and made setup instructions for each of them.

Then I made the VM setup guide which can be found here: https://docs.google.com/document/d/1h3yitViqHSCFIGRavRVXS2N8mghXY5RRwU7VamdTrG8/edit?tab=t.0 . I spent a lot of time getting this into a doc and making sure everything worked by following these instructions. I also setup TLS this week on the VM so spent a lot of time researching into how to do this. I struggled setting up the Nginx reverse proxy so that TLS works and spent lots of time on this but eventually got it working. I got it to work from a curl command on my computer but still need to make sure it works from the device using https and not http which I will do next week. After I finish this next week, I will update the guide with what steps are necessary to make sure https works from the device and update the main website with the setup guide. My progress is on schedule. Next week I will work on finishing the setup guide and finalizing the website, and also help with the user testing experiment we plan on doing.

The week before this one (when no status report was due) I worked on testing different prompts for us to use and choosing the best one.

In terms of verification, we plan to run the tests mentioned above to test that users can setup their device and vm under a certain amount of time we mention in the requirements. I also tested our prompt engineering efforts by testing many different prompts against expected results and chose the best one. So model accuracy testing is complete. We also plan on testing the latency of the whole system in the coming week or 2. I also tested that the websites were responsive and easy to use and follow by asking a group of 10 people what they thought of the website and how intuitive it is.