Justin Ankrom’s Status Report for 4/12

This week I worked on 2 main things: making all the instructions to download all the different models and making the setup guide for the VM. I first had to make all the different docker images for both the flask app and ollama server for the different models. Since we chose 4 models, this was a total of 8 docker images I had to build and upload to docker hub. I then went ahead and replaced all the model placeholder values on our website with these models and made setup instructions for each of them.

Then I made the VM setup guide which can be found here: https://docs.google.com/document/d/1h3yitViqHSCFIGRavRVXS2N8mghXY5RRwU7VamdTrG8/edit?tab=t.0 . I spent a lot of time getting this into a doc and making sure everything worked by following these instructions. I also setup TLS this week on the VM so spent a lot of time researching into how to do this. I struggled setting up the Nginx reverse proxy so that TLS works and spent lots of time on this but eventually got it working. I got it to work from a curl command on my computer but still need to make sure it works from the device using https and not http which I will do next week. After I finish this next week, I will update the guide with what steps are necessary to make sure https works from the device and update the main website with the setup guide. My progress is on schedule. Next week I will work on finishing the setup guide and finalizing the website, and also help with the user testing experiment we plan on doing.

The week before this one (when no status report was due) I worked on testing different prompts for us to use and choosing the best one.

In terms of verification, we plan to run the tests mentioned above to test that users can setup their device and vm under a certain amount of time we mention in the requirements. I also tested our prompt engineering efforts by testing many different prompts against expected results and chose the best one. So model accuracy testing is complete. We also plan on testing the latency of the whole system in the coming week or 2. I also tested that the websites were responsive and easy to use and follow by asking a group of 10 people what they thought of the website and how intuitive it is.

Team Status report for 3/29

This week we spent getting everything ready for the demo and doing e2e tests to make sure everything is setup and ready.  We got everything to work for the demo where the user can talk to the assistant to have a conversation, set/stop an alarm, or play/stop playing a song. We also got a VM with a GPU setup where we are doing model inference for quicker performance (for the demo we will be using llama3.2 with 8b params). We also made some updates to our UI for a better user experience. This week we did lots of integrations and putting everything together for the first time and are very pleased with our results.

We don’t foresee any big risks or challenges up ahead as we were able to overcome our biggest challenges which were getting the mic and speaker to work and to integrate all parts of the system together. One small issue that we may see is that we had set our latency requirements to be under 5 seconds, but lots of our current tests are a little bit over this. We are looking into changing text to speech models to a smaller one as it is a big bottleneck in our system right now and are also looking into streaming our responses from the model instead of waiting for the response to be fully complete which might give us some extra performance.

No changes were made to the existing design of the system this week.

Justin Ankrom’s Status Report for 3/29

This week I worked on getting things ready for the demo. First I worked on adding some new sections to the main website people can visit to learn about Voice Vault. I added a section describing what our product is and another section on how to setup the device. I also added the top nav bar to navigate through all the sections: 
I also worked on setting up the pages needed for the device website. Originally, I had setup just one page after the user logged in where they could adjust their VM url and see their music. This week I worked on overhauling how this works. I made it so the home page just had 2 buttons where the user can change their configuration settings or go to the music page. This looks like this:

I also implemented functionality on everything on the settings page to be saved locally so we can access it later, but the functionality to actually change all the configs internally hasn’t been implemented yet. For example I can change the wake word and it will save but it won’t actually change the wake word being used by the device.

This week I also worked extensively with David to try and get our models containerized. I worked on developing the actual docker files being used and the docker compose. I came up with a solution found here: https://github.com/jankrom/Voice-Vault/commit/7501d4eebc4cd480b79e89e9fdfd27402f51c14f. With this solution, we just need to run “MODEL=“smollm2:135m” MODEL_TAG_DOCKER=“smollm2-135m” docker compose up –build” where we just change what the env variables to change which model is being downloaded form ollama. This makes it much more flexible and really easy for us to make new model containers. We struggled a lot trying to build this and get it to work. WE also had a lot of trouble to get ollama to use the GPU, so we spent many hours doing this. But we eventually got it to work. I then also spun up a VM on GCP with a GPU and set up up a container using llama3.2 8b version which we will use for the demo.

Lastly, I worked together with David and Kemdi doing final touches on getting everything ready for the demo. This included testing everything end to end for our 3 main features (talking to model, setting alarm, playing music) and fixing stuff as they came up. Ultimately, we got everything done we wanted to for the demo.

My progress is on schedule. Next week I want to do the prompt engineering tests to find the best prompt for us to use. I also want to finish the VM setup guide and I wand to finish up the main website to no longer include some placeholder values for the models (which will also include making all the model containers).

Justin Ankrom’s Status Report for 3/22

This week I accomplished 2 main things: setting up the alarm clock feature and setting up the terms of service. I setup an alarm clock so that the user can use their voice to set an alarm and cancel it. Here is the code for it: https://github.com/jankrom/Voice-Vault/commit/154f069886dcc4a63c505fd4d009cbf75d0b61fd. This involved researching how to make an alarm clock that works asynchronously without blocking behavior. This actually proved more difficult than anticipated because almost all solutions online are synchronous, meaning they blocked the behavior of the script. Finally I came up with this working solution. I also tested it both on my computer and also on device and it works. I also worked on getting the terms of service up which looks like this: 

I used what I learned from my talk with Professor Brumley and included everything that we discussed. This essentially included making it very clear to the user how their data is being used and that they agree to this. I had also hoped to get the VM setup documentation done this week but I had a very busy week with exams and other coursework this week so wasn’t able to get as much done as I would’ve hoped. Next week I will make up for this by completing the VM setup on top of what I had planned, which will bring me back to schedule. Next week I will work with David and Kemdi to get everything up and running for the demo week the week after next week. This will include putting all necessary code onto the device, integrating all our solutions, hosting a model on the cloud, and do tests to make sure everything is working. Our goal is to have a fully working solution by demo day (I am hopeful that we can get this done).

Justin Ankrom’s Status Report for 3/15

This week I worked on adding authentication to the device configuration website and requests made from the device to the cloud. By this I mean, the user has to enter a password that will come with the device (think something like how an internet modem comes with a default password on a sticker on the bottom) to enter the website. This password will also be part of each request made to the cloud and checked on the cloud to ensure that only our device can make requests to our cloud. This protects against unwanted users changing our device configuration or using our cloud resources. I also chose the 4 open source models we will be picking:Llama 3.1 8b params (https://huggingface.co/meta-llama/Llama-3.1-8B), Vicuna v1.5 7b params (https://huggingface.co/lmsys/vicuna-7b-v1.5), Falcon-7B-Instruct (https://huggingface.co/tiiuae/falcon-7b-instruct), Qwen2.7-7B-Instruct (https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). I chose ones around 7-8 billion parameters since I think this will give us the best performance while also being relatively small. My concern with this is that it might make the cloud hardware needed to run this model expensive, so I will further research and test to see if we can run them on cloud while not being too expensive. If not, I have already looked into 1-3b param models that we could use. I also helped David start configuring the docker containers.

I am on schedule for what I need to get done. Next week I want to do some testing to see if our 7-8b param models will not be too expensive to run on cloud. Based on those results I might need to swap the models. Last week I said that this week I wanted to fill the website out, but I decided to postpone that until we have finalized everything, so in the last 1-2 weeks of the project. This is because I don’t want to make changes and then have to override them over and over based on additional testing or changes we make. Next week I also want to start and finish the music player. This week I researched into how I will do it and next week I just need to implement it.

Overall, this week involved getting authentication setup and doing a lot of research on models and how to setup the music player.

Justin Ankrom’s Status Report for 3/8

This week, I setup the website that will live on the physical hardware as well the website we host. Originally, we had planned to have only one website which we hosted, but decided to go a different route. The reason for this was because if we hosted everything, we would’ve needed some form of authentication so we would have had to stored user information which goes against our security policies. So I had to come up with a new approach. I came up with using 2 websites instead: 1 that lives on the hardware and another that we host. The one that lives on the hardware will be where people can set their VM configuration (VM ip address) and also look at the music they have saved, and the website we host will be strictly for setup instructions and for data privacy terms of service. This means that what we host is purely static and applies to every single user, while user configuration lies on the client side. This change means that I had to scrap almost all of the existing website code and restart. I spent a lot of time coming up with this 2 website approach and thinking about how I wanted to do it. For our hosted website, I am still using React and Next.js and hosting it on Vercel. For the client website, I decided to serve a simple HTML page on a Flask app. This is because on the client side, we have very limited resources, so I decided to go with a very lightweight approach. I have initial websites that work for both client side and our side. A lot of it is filled with filler content but this is fine as it will be quick to update the actual content since the overall layout is established.

Based on this, I think my progress is on schedule given the recent pivots we discussed in the team status report. Next week I want to work on filling out the actual content of our website, meaning setting up VM setup instructions, and want to setup a docker container for at least one open source model so we can get ahead with that. I also want to pick the exact open source models we want to use so we have a finalized list of those models.

Here are some pictures of the websites.

Hero section of website
VM setup instructions
Available models
Privacy section
Client side configuration website

Team status report for 3/8

The most significant risk that could jeopardize the success of our project is getting everything to run on the board. As will be explained below, we are adding 2 new features and moved some part of website to being hosted on the board, which means we will need more resources on the board itself. To mitigate this risk, we will have an MVP up and running ASAP to test on the board. This way we can see if we will need to upgrade to a different board or if our raspberry pi 4 with 4gb ram is enough.

This week we decided to make some pretty substantial changes based on the design presentation feedback:

  • Add 2 new features to increase scope of our project: (1) an timer feature and (2) an mp3 like feature where users can upload songs to their device and can play them from their device.
  • We decided to move part of the website from being something we hosted to being hosted on the device. The website had 2 main purposes: (1) configure your device by passing in your VM endpoint and (2) everything else such as setup instructions, data privacy policies, and docker containers. Everything in (2) is strictly static, meaning that the same content can be used by every single user, so we are still going to be hosting this part. However, we decided to move (1) to being hosted on the actual device for each user. By doing it this way, we eliminate all 3rd parties, including ourselves. If we had kept (1) on our end, it would require us or a 3rd party to store user info (we chose Clerk which is a 3rd party but if we had implemented auth ourselves we would have had to store user info) which is against what we want to do. By doing the configuration on device, we eliminate the use of any 3rd party system and then the user owns all parts of the system which is our intended goal.
  • We held a meeting with David Brumley and he gave us some advice regarding how to define privacy. Based on his feedback, we will be adding a terms of service agreement to our website to give users visibility into how their data is being used. We want to make it explicit to them that we nor any other 3rd party has access to their data or information at any point, including that their data is not being stored, their data is not being reused to train any models, etc. We need to make it very explicit that they own all aspects of the user experience.
  • We also made the choice to have the users setup SSL encryption on their VMs (with instructions on our end). By doing so, we are not prone to man in the middle attacks between the devices and the VMs when making endpoint requests, which was a point of concern. After this is done, we can ensure that all privacy is being maintained with regards to the data transfers.Based on these changes, this is a rough estimate of what we want our schedule to look like in the upcoming weeks:
  • Week 1-2: finalize design report, get rid of auth and move configuration to a separate device hosted website, start working on dockerization containers, start testing speech to text and text to speech.
  • Week 3-4: Polish out UI and have it fully working, finish ToS, have dockers ready and on website with instructions, test individual components
  • Week 5-6: test everything e2e, setup instruction guidebook
  • Week 7: slack time to finish up anything we didn’t finish before. This accounts for unforeseen circumstances and pivotsWeek specific report:
  • Part A(David):Our product has one main global factor that it affects. It is that our product ensures privacy to the user. While in the US this is mainly a data privacy issue as our government is most likely not actively using our data to control us, in other parts of the world this could be helpful against more controlling governments and therefore offer physical protection. In areas where there is no freedom of speech or less of it, a voice assistant in the house can be a danger if the data is sent out to an unknown location. 

    This product could also be of assistance in protecting government officials or anyone with confidential information. Because most voice assistant companies are in the US, there might be some mistrust for foreign people, especially those with confidential information being spoken in their houses. Our product ensures that all the data is kept within the user’s control, so that people from across the world can feel free to say anything they want and have it not be sent to a US server.

  • Part B: (Kemdi Emegwa)Voice Vault aims to preserve the right to privacy and the right to consent while still allowing users to leverage state of the art artificial intelligence. By allowing the user to host their own model whether that be on the cloud or on their own local server, they gain the ability to dictate how their data is stored/used. At a time when AI/ML advancements have come perpendicular to user concerns about data privacy, our lightweight solution can bridge the gapPart C (Justin Ankrom): Voice Vault minimizes environmental impact by reducing energy consumption and electronic waste. It runs on a low-power 4GB RAM Arduino 4 board and has local storage via microSD which reduces dependence on external servers for storage, lowering the system’s carbon footprint. Its customizable design extends hardware lifespan by allowing upgrades instead of full replacements, reducing electronic waste. By supporting self-hosted LLMs, Voice Vault eliminates reliance on large-scale data centers, further decreasing energy consumption.

Justin Ankrom’s Status Report for 2/22

This week I worked on setting up user board configuration through our software, which we store with Clerk metadata. Progress is on schedule. In the next week, I hope to refine website so it is fully functional and done and reflects the changes we made with our pivot, and also hope to do some preliminary testing of website and board configurations.

Justin Ankrom’s Status Report for 2/15

This week I added authentication to the our website. We chose to use Clerk as our authentication provider, since we get up to 10k active users for free and it makes it very easy to setup auth without us having to worry about setting it up ourselves. Now a user will have to be authenticated to enter our website. My progress is on schedule. Next week I will add a functionality for users to link their account with their physical board so we can ensure only they will be able to access and modify their board through our software.

Team Status Report for 2/15

We don’t have any significant risk at this moment that can jeopardize the the success of our project. We did however made changes to our design of the system. Originally we had planned to use 2 different types of models: a realtime model and a non-realtime model. However, we decided to pivot and remove the realtime model because we believe that it would compromise the safety and privacy of the user, which is our biggest requirement and goal. This pivot does not change our schedule. This week we got authentication working, our board setup, and got text to speech and speech to text working.

Status Report 2 specific questions:
Part A (Justin Ankrom):  Voice Vault is designed to enhance public health, safety, and welfare by prioritizing user privacy. In terms of health, we support users’ psychological well-being by ensuring their interactions with the assistant remain entirely private, mitigating concerns about data surveillance and unauthorized access. Voice Vault operates on self-hosted infrastructure, preventing leaks that could expose sensitive personal information.  We ensure that their interactions are never exposed to third-party entities. This design is particularly valuable for individuals handling sensitive information.

Part B (David Herman): Voice vault can have an effect on social and political factors due to its protection against data leaks and corrupt companies. Data leaks happen frequently even when the company storing it is not actively sharing it. Data can also be misused which has happened often before and still happens now. Information such as what a person asks a voice assistant in their own home is very sensitive, and is best kept to oneself. Our device allows users who do not trust where their data goes and who holds it a solution with privacy.

Part C (Kemdi Emegwa): Voice Vault is positioned to be an economic powerhouse, with a reasonably low cost basis, if it were to be coupled with a high means of production he economics of the project are very high. The cost of producing our prototype is expected to be under $150, which if we wanted to sell this product allow us to price it extremely competitively. Additionally, since the user can host their own model, they have complete control over the cloud costs.