Team Status Report for 4/26

This past week we finished up testing.

This week we will be working on the poster, report, and demo.

We performed unit tests on the 4 different models we had selected: llama3-8b, falcon-7b, qwen2.5-7b, and vicuna-7b. Testing and results can be found here: https://github.com/jankrom/Voice-Vault/tree/main/server/model-testing. This involved setting up the tests for each model, making a python script, and saving the results in a png for each model. I found that llama3 had the best accuracy at 100%, while qwen and vicuna both did around 90%. Falcon actually had a score of 0% accuracy which was very surprising. I looked into it more and it could be because it is optimized for code, as I saw a lot of the responses the model was giving was in javascript and such, despite the model saying it is optimized for conversations. These results cause me to remove falcon from our options and now we will only offer the other 3 models to pick from. This resulted in me having to modify our website to only include the other 3.

We performed unit tests on the system prompts to test out multiple different system prompt and find the best one. The best one we found gave us 100% accuracy in selecting if it is an alarm request, music request, or LLM request.

We performed many e2e tests just by interacting with the system and we did not find any errors when doing so.

Kemdi Emegwa’s Status Report for 4/26

I spent a large part of this week preparing for my presentation. I sent time practicing what I was going to say and how to best convey our object and technical design. In addition, we as a team did a lot more testing for the final report. We conducted more user studies, to help verify our technical requirements.

By myself, I spent time benchmarking our Speech-To-Text and Text-To-Speech models as well as the roundtrip latency for our entire system. My findings showed that in most cases the roundtrip time from when the user stops speaking their query to when the model starts with its response is under 5s. This fulfills our technical specifications. The only case where this failed was on the first query to the server. This is likely attributed to a need for a buffer to make sure everything has started up properly.

I think we are definitely on track. For this upcoming week, I will mainly spend my time working on the report and poster for the final demo.

Justin Ankrom’s Status Report for 4/26

This week I did a few things. First, I performed unit tests on the 4 different models we had selected: llama3-8b, falcon-7b, qwen2.5-7b, and vicuna-7b. Testing and results can be found here: https://github.com/jankrom/Voice-Vault/tree/main/server/model-testing. This involved setting up the tests for each model, making a python script, and saving the results in a png for each model. I found that llama3 had the best accuracy at 100%, while qwen and vicuna both did around 90%. Falcon actually had a score of 0% accuracy which was very surprising. I looked into it more and it could be because it is optimized for code, as I saw a lot of the responses the model was giving was in javascript and such, despite the model saying it is optimized for conversations. These results cause me to remove falcon from our options and now we will only offer the other 3 models to pick from. This resulted in me having to modify our website to only include the other 3. Additionally, this week I also deployed our website on Vercel so that it can be accessed from anywhere. Here is the link: https://voice-vault-18500.vercel.app/ I had to learn about how to deploy websites on Vercel and fix issues that were occuring when building my app in production mode. This week was meant to be a lighter week since it was designed to have some room for slack.

My progress is on schedule. I don’t have any more work on the actual project, just logistical stuff such as the final poster, final video, and final report, which is what I will complete in the following week.

David’s Status Report for 4/26

This week I mainly just did some user testing with Kemdi and Justin where we say how long it took for them to set up the product along with asking some validation questions like if they like it or not and what would be better if it was changed. I also had to deal with some hiccups in the 3d printing process as two weeks ago I had given a usb to the fbs center as their website was down. However early this week I checked on the progress and apparently it was never started, so unfortunately I had to resubmit. I believe it should be done by the demo, however I am not sure. There is nothing left to be done anymore for the project itself, only the report, prep for demo, and poster.

Kemdi Emegwa’s Status Report for 4/19

I spent this week improving the quality of our system primarily but implementing streaming queries from the cloud VM to the device. As mentioned in my last status report, I implemented streaming the query response from the text to speech engine to the speaker. This week I built on top of this logic so the entire query process is streamed end to end.

Using the python httpx library, I changed the server logic so rather than sending the entire response at once, it first sends the type, so the device can prepare to handle the response and then it sends the rest of the response in chunks. This massively improved the time to first token, making our program feel near real-time.

Additionally, as a team, we worked with some test subjects to validate and verify the quality of our system. The overall sentiment was highly positive.

I think we are definitely on track to finish our project and don’t for see any blockers. This next week, I will mainly spend on error handling and improving the robustness our system.

 

Justin Ankrom’s Status Report for 4/19

This week I finished up setting up TLS. Last week I had setup TLS on the server so this week I worked with Kemdi to set it up on the device. I then finished up the setup guide and put it up on our main website. This means that the “marketing” website is now complete. I also worked on making a new prompt for our ML models that work with our new streaming approach. This involved changing responses from json to something streamable and updating all examples and adding some additional criteria in the prompt to get better responses. This involved a lot of time and testing. I tested to make sure it works for all 3 of our core functionalities. I also performed the group testing alongside David and Kemdi where we tested our setup with 10 different people and also asked the questions about our product and what their thoughts were.

My progress is on schedule. Next week I want to setup the website on the cloud so it is accessible from anywhere by anyone. This will involve researching solutions and deploying it. I will also use this week as a time to thoroughly test the system and fix any last minute bugs/mistakes.

In regards to this week specific question, I used many new tools to accomplish my tasks for my project. My main tool was honestly Google. many times when I was stuck or needed help, I would search Google for my problem to see if people had any similar ideas, problems, or solutions. I used a lot of stack overflow. My main learning strategy was to come up with my own solution and then look online to see what others have proposed. This usually led me down a rabbit hole of figuring out why or why not my solution works or what approach I should take. I also used youtube tutorials on how to do certain things like deploy ollama with a GPU on a docker container. Throughout the project, I had to learn how to serve a frontend from a Flask server, how to setup TLS on a cloud VM, how to do prompt engineering, how to setup a Next.js application, how to setup and use a raspberry pi, how to setup ollama and use it to serve models that can use a VM’s GPU, and more.