Team Status Report for 4/26
This past week we finished up testing.
This week we will be working on the poster, report, and demo.
We performed unit tests on the 4 different models we had selected: llama3-8b, falcon-7b, qwen2.5-7b, and vicuna-7b. Testing and results can be found here: https://github.com/jankrom/Voice-Vault/tree/main/server/model-testing. This involved setting up the tests for each model, making a python script, and saving the results in a png for each model. I found that llama3 had the best accuracy at 100%, while qwen and vicuna both did around 90%. Falcon actually had a score of 0% accuracy which was very surprising. I looked into it more and it could be because it is optimized for code, as I saw a lot of the responses the model was giving was in javascript and such, despite the model saying it is optimized for conversations. These results cause me to remove falcon from our options and now we will only offer the other 3 models to pick from. This resulted in me having to modify our website to only include the other 3.
We performed unit tests on the system prompts to test out multiple different system prompt and find the best one. The best one we found gave us 100% accuracy in selecting if it is an alarm request, music request, or LLM request.
We performed many e2e tests just by interacting with the system and we did not find any errors when doing so.
Kemdi Emegwa’s Status Report for 4/26
I spent a large part of this week preparing for my presentation. I sent time practicing what I was going to say and how to best convey our object and technical design. In addition, we as a team did a lot more testing for the final report. We conducted more user studies, to help verify our technical requirements.
By myself, I spent time benchmarking our Speech-To-Text and Text-To-Speech models as well as the roundtrip latency for our entire system. My findings showed that in most cases the roundtrip time from when the user stops speaking their query to when the model starts with its response is under 5s. This fulfills our technical specifications. The only case where this failed was on the first query to the server. This is likely attributed to a need for a buffer to make sure everything has started up properly.
I think we are definitely on track. For this upcoming week, I will mainly spend my time working on the report and poster for the final demo.
Justin Ankrom’s Status Report for 4/26
This week I did a few things. First, I performed unit tests on the 4 different models we had selected: llama3-8b, falcon-7b, qwen2.5-7b, and vicuna-7b. Testing and results can be found here: https://github.com/jankrom/Voice-Vault/tree/main/server/model-testing. This involved setting up the tests for each model, making a python script, and saving the results in a png for each model. I found that llama3 had the best accuracy at 100%, while qwen and vicuna both did around 90%. Falcon actually had a score of 0% accuracy which was very surprising. I looked into it more and it could be because it is optimized for code, as I saw a lot of the responses the model was giving was in javascript and such, despite the model saying it is optimized for conversations. These results cause me to remove falcon from our options and now we will only offer the other 3 models to pick from. This resulted in me having to modify our website to only include the other 3. Additionally, this week I also deployed our website on Vercel so that it can be accessed from anywhere. Here is the link: https://voice-vault-18500.vercel.app/ I had to learn about how to deploy websites on Vercel and fix issues that were occuring when building my app in production mode. This week was meant to be a lighter week since it was designed to have some room for slack.
My progress is on schedule. I don’t have any more work on the actual project, just logistical stuff such as the final poster, final video, and final report, which is what I will complete in the following week.
David’s Status Report for 4/26
This week I mainly just did some user testing with Kemdi and Justin where we say how long it took for them to set up the product along with asking some validation questions like if they like it or not and what would be better if it was changed. I also had to deal with some hiccups in the 3d printing process as two weeks ago I had given a usb to the fbs center as their website was down. However early this week I checked on the progress and apparently it was never started, so unfortunately I had to resubmit. I believe it should be done by the demo, however I am not sure. There is nothing left to be done anymore for the project itself, only the report, prep for demo, and poster.
Final Presentation
Kemdi Emegwa’s Status Report for 4/19
I spent this week improving the quality of our system primarily but implementing streaming queries from the cloud VM to the device. As mentioned in my last status report, I implemented streaming the query response from the text to speech engine to the speaker. This week I built on top of this logic so the entire query process is streamed end to end.
Using the python httpx library, I changed the server logic so rather than sending the entire response at once, it first sends the type, so the device can prepare to handle the response and then it sends the rest of the response in chunks. This massively improved the time to first token, making our program feel near real-time.
Additionally, as a team, we worked with some test subjects to validate and verify the quality of our system. The overall sentiment was highly positive.
I think we are definitely on track to finish our project and don’t for see any blockers. This next week, I will mainly spend on error handling and improving the robustness our system.
Justin Ankrom’s Status Report for 4/19
This week I finished up setting up TLS. Last week I had setup TLS on the server so this week I worked with Kemdi to set it up on the device. I then finished up the setup guide and put it up on our main website. This means that the “marketing” website is now complete. I also worked on making a new prompt for our ML models that work with our new streaming approach. This involved changing responses from json to something streamable and updating all examples and adding some additional criteria in the prompt to get better responses. This involved a lot of time and testing. I tested to make sure it works for all 3 of our core functionalities. I also performed the group testing alongside David and Kemdi where we tested our setup with 10 different people and also asked the questions about our product and what their thoughts were.
My progress is on schedule. Next week I want to setup the website on the cloud so it is accessible from anywhere by anyone. This will involve researching solutions and deploying it. I will also use this week as a time to thoroughly test the system and fix any last minute bugs/mistakes.
In regards to this week specific question, I used many new tools to accomplish my tasks for my project. My main tool was honestly Google. many times when I was stuck or needed help, I would search Google for my problem to see if people had any similar ideas, problems, or solutions. I used a lot of stack overflow. My main learning strategy was to come up with my own solution and then look online to see what others have proposed. This usually led me down a rabbit hole of figuring out why or why not my solution works or what approach I should take. I also used youtube tutorials on how to do certain things like deploy ollama with a GPU on a docker container. Throughout the project, I had to learn how to serve a frontend from a Flask server, how to setup TLS on a cloud VM, how to do prompt engineering, how to setup a Next.js application, how to setup and use a raspberry pi, how to setup ollama and use it to serve models that can use a VM’s GPU, and more.
Team Status Report for 4/19
There are currently no risks to our project as we are essentially done with design. We are currently working on the testing that we outlined in the design report. This week we did the user testing together with 10 users.
No changes were made to the design as we are just testing, except for a little bit of the 3d model which was outlined in David’s status report. This change is really minute and just requires a little bit of tape.
We are still on schedule, and just have to test the product.
David’s Status Report for 4/19
This week I mainly was designing the final 3d print. It took me some time to figure out how to put a sliding lid on the bottom. I also decided to not put a cover on the top as it could potentially block out the sound or the mic. Instead I am thinking of maybe using tape to hold it in. If it prints how I want, which it might not due to the sliding door needing very specific resolution which can vary from printer to printer, then we should be good for the model. If not I will have to submit another design. The rest of the week I spent with my group helping to user test our project. We met with 10 friends who we gave the product, and let them set it up, and afterwards we asked them questions which the specifics will be covered more in the report. I am definitely on schedule, and next week I will finalize all the testing like the latencies for both end to end and component wise.
For this week’s question, I definitely learned a ton working on this project as I kind of bounced between tasks. I initially learned how to set up the raspberry pi and configure its wifi, data/time, and other settings. To create the server, I learned flask and how endpoints work along with the ollama app for the model. I also learned how to create a VM using google cloud platform. I have never worked with docker before, so I also learned how to containerize a server. Lastly, I learned how to 3d design using fusion 360 to create the physical casing. To learn all these topics I did a lot of research online as well as talking to my group (if one of them was familiar with it).