Kemdi Emegwa’s Status Report for 4/19

I spent this week improving the quality of our system primarily but implementing streaming queries from the cloud VM to the device. As mentioned in my last status report, I implemented streaming the query response from the text to speech engine to the speaker. This week I built on top of this logic so the entire query process is streamed end to end.

Using the python httpx library, I changed the server logic so rather than sending the entire response at once, it first sends the type, so the device can prepare to handle the response and then it sends the rest of the response in chunks. This massively improved the time to first token, making our program feel near real-time.

Additionally, as a team, we worked with some test subjects to validate and verify the quality of our system. The overall sentiment was highly positive.

I think we are definitely on track to finish our project and don’t for see any blockers. This next week, I will mainly spend on error handling and improving the robustness our system.

 

Justin Ankrom’s Status Report for 4/19

This week I finished up setting up TLS. Last week I had setup TLS on the server so this week I worked with Kemdi to set it up on the device. I then finished up the setup guide and put it up on our main website. This means that the “marketing” website is now complete. I also worked on making a new prompt for our ML models that work with our new streaming approach. This involved changing responses from json to something streamable and updating all examples and adding some additional criteria in the prompt to get better responses. This involved a lot of time and testing. I tested to make sure it works for all 3 of our core functionalities. I also performed the group testing alongside David and Kemdi where we tested our setup with 10 different people and also asked the questions about our product and what their thoughts were.

My progress is on schedule. Next week I want to setup the website on the cloud so it is accessible from anywhere by anyone. This will involve researching solutions and deploying it. I will also use this week as a time to thoroughly test the system and fix any last minute bugs/mistakes.

In regards to this week specific question, I used many new tools to accomplish my tasks for my project. My main tool was honestly Google. many times when I was stuck or needed help, I would search Google for my problem to see if people had any similar ideas, problems, or solutions. I used a lot of stack overflow. My main learning strategy was to come up with my own solution and then look online to see what others have proposed. This usually led me down a rabbit hole of figuring out why or why not my solution works or what approach I should take. I also used youtube tutorials on how to do certain things like deploy ollama with a GPU on a docker container. Throughout the project, I had to learn how to serve a frontend from a Flask server, how to setup TLS on a cloud VM, how to do prompt engineering, how to setup a Next.js application, how to setup and use a raspberry pi, how to setup ollama and use it to serve models that can use a VM’s GPU, and more.

Team Status Report for 4/19

There are currently no risks to our project as we are essentially done with design. We are currently working on the testing that we outlined in the design report. This week we did the user testing together with 10 users.

No changes were made to the design as we are just testing, except for a little bit of  the 3d model which was outlined in David’s status report. This change is really minute and just requires a little bit of tape.

We are still on schedule, and just have to test the product.

David’s Status Report for 4/19

This week I mainly was designing the final 3d print. It took me some time to figure out how to put a sliding lid on the bottom. I also decided to not put a cover on the top as it could potentially block out the sound or the mic. Instead I am thinking of maybe using tape to hold it in. If it prints how I want, which it might not due to the sliding door needing very specific resolution which can vary from printer to printer, then we should be good for the model. If not I will have to submit another design.  The rest of the week I spent with my group helping to user test our project. We met with 10 friends who we gave the product, and let them set it up, and afterwards we asked them questions which the specifics will be covered more in the report. I am definitely on schedule, and next week I will finalize all the testing like the latencies for both end to end and component wise.

For this week’s question, I definitely learned a ton working on this project as I kind of bounced between tasks. I initially learned how to set up the raspberry pi and configure its wifi, data/time, and other settings. To create the server, I learned flask and how endpoints work along with the ollama app for the model. I also learned how to create a VM using google cloud platform. I have never worked with docker before, so I also learned how to containerize a server. Lastly, I learned how to 3d design using fusion 360 to create the physical casing. To learn all these topics I did a lot of research online as well as talking to my group (if one of them was familiar with it).

Kemdi Emegwa’s Status Report for 4/12

This week was mainly spent hardening our system and ironing out kinks and last minute problems. As mentioned in previous reports, we were encountering a dilemma where the text to speech model we were using was sub par, but the better one was a bit too slow for our use case. To combat this we introduced streaming into out architecture.

There were two main areas were were streaming needed to be introduced first was for the text to speech model itself. This would improve the time to first token output because rather than synthesizing the whole text  we can synthesize in chunks and then output them. This already dramatically improved performance and allowed us to use the higher quality model. However this did not address the fact that if the model hosted on the cloud returned a very long response we would still have to wait for the entire thing before running the text-to-speech model.

In order to address this fault, we decided to stream the entire query architecture. This involved work on both the server side and the device side. We also had to change how the model returned its response to accommodate for this. However, immediately sending each chunk to the tts model to be outputted resulted in weird and choppy output. To rectify this, I made it so that the device buffers chunks until it sees a “.” or “,”, then it sends type buffered chunks to the tts model. This made it sound significantly more natural.

For this next week, I will mainly spend my time cleaning up code/error handling and also working with Justin to introduce TLS so we can query over https rather than http. I think we are definitely on track and I don’t for see us encountering any problems.

Team Status Report for 4/12

This week we have been working on streaming for both server and board models. We have been able to stream responses end to end however they are pretty choppy. We will try to fix it next week by increasing the amount of words in each send, so that the speech to text model has a little more to work with.

We currently do not have any big risks as our project is mostly integrated. We have not changed any part of the design and are on schedule. We plan to do the testing and verification soon as we have finally finished the parts that deal with latency as we obviously couldn’t have done any useful testing until that was finished.

David’s Status Report for 4/12

This week I worked on two main things: server side streaming and 3d printing. For server side streaming, we were worried about end to end latency as current testing seemed to be around 8 seconds. To eliminate this, we decided to implement streaming for both the server/model and the on board speech/text models. I did the server side and modified the code to allow the server to return words as they are generated which massively increases the speed. This makes the speech a little choppy due to it receiving one word at a time, so next week I might try to change the granularity to maybe 5-10 words. I also have done some work on improving the 3d model. The first iteration came out pretty well, but I want to add some more components to make it complete like a sliding bottom and a sliding top to prevent the speaker from falling out if moved upside down. I am hoping this is the final iteration, but if it doesn’t come out according to plan then I might have to do one more print. I am on schedule, and along with the aforementioned things I will also try to do some testing/verification for next week.

Justin Ankrom’s Status Report for 4/12

This week I worked on 2 main things: making all the instructions to download all the different models and making the setup guide for the VM. I first had to make all the different docker images for both the flask app and ollama server for the different models. Since we chose 4 models, this was a total of 8 docker images I had to build and upload to docker hub. I then went ahead and replaced all the model placeholder values on our website with these models and made setup instructions for each of them.

Then I made the VM setup guide which can be found here: https://docs.google.com/document/d/1h3yitViqHSCFIGRavRVXS2N8mghXY5RRwU7VamdTrG8/edit?tab=t.0 . I spent a lot of time getting this into a doc and making sure everything worked by following these instructions. I also setup TLS this week on the VM so spent a lot of time researching into how to do this. I struggled setting up the Nginx reverse proxy so that TLS works and spent lots of time on this but eventually got it working. I got it to work from a curl command on my computer but still need to make sure it works from the device using https and not http which I will do next week. After I finish this next week, I will update the guide with what steps are necessary to make sure https works from the device and update the main website with the setup guide. My progress is on schedule. Next week I will work on finishing the setup guide and finalizing the website, and also help with the user testing experiment we plan on doing.

The week before this one (when no status report was due) I worked on testing different prompts for us to use and choosing the best one.

In terms of verification, we plan to run the tests mentioned above to test that users can setup their device and vm under a certain amount of time we mention in the requirements. I also tested our prompt engineering efforts by testing many different prompts against expected results and chose the best one. So model accuracy testing is complete. We also plan on testing the latency of the whole system in the coming week or 2. I also tested that the websites were responsive and easy to use and follow by asking a group of 10 people what they thought of the website and how intuitive it is.

David’s Status Report for 3/29

This week I worked on Docker and some VM stuff. Throughout the week I tried to fix the docker container to run Ollama inside, but to no avail. Me and Justin also tried working on it together, but we weren’t able to fully finish it. Justin was able to fix it later, and I was also able to make a working version as well, although we are going to use Justin’s version as the final. The main issue for my version was that my gpu wasn’t running with the model in the container. I fixed this by not scripting ollama serve in the Dockerfile initially, and just downloading Ollama first. Then I would be able to Docker run with all gpus to start my container. After that I could pull the models and also run Ollama serve to have a fully functional local docker container working. If we were to have used this version I could script the pulling and running of Ollama serve to occur after running Docker. Earlier in the week I also tried to get a vm with a T4 GPU on gcp. However, after multiple tries across servers, I was not able to successfully acquire one. Me, Kemdi, and Justin also met together at the end of the week to flesh out the demo which is basically fully working. I am on schedule, and my main goal for next week is to get a 3d printed container for the board and speaker/mic through FBS.