Team Status Report for 2/15

We don’t have any significant risk at this moment that can jeopardize the the success of our project. We did however made changes to our design of the system. Originally we had planned to use 2 different types of models: a realtime model and a non-realtime model. However, we decided to pivot and remove the realtime model because we believe that it would compromise the safety and privacy of the user, which is our biggest requirement and goal. This pivot does not change our schedule. This week we got authentication working, our board setup, and got text to speech and speech to text working.

Status Report 2 specific questions:
Part A (Justin Ankrom):  Voice Vault is designed to enhance public health, safety, and welfare by prioritizing user privacy. In terms of health, we support users’ psychological well-being by ensuring their interactions with the assistant remain entirely private, mitigating concerns about data surveillance and unauthorized access. Voice Vault operates on self-hosted infrastructure, preventing leaks that could expose sensitive personal information.  We ensure that their interactions are never exposed to third-party entities. This design is particularly valuable for individuals handling sensitive information.

Part B (David Herman): Voice vault can have an effect on social and political factors due to its protection against data leaks and corrupt companies. Data leaks happen frequently even when the company storing it is not actively sharing it. Data can also be misused which has happened often before and still happens now. Information such as what a person asks a voice assistant in their own home is very sensitive, and is best kept to oneself. Our device allows users who do not trust where their data goes and who holds it a solution with privacy.

Part C (Kemdi Emegwa): Voice Vault is positioned to be an economic powerhouse, with a reasonably low cost basis, if it were to be coupled with a high means of production he economics of the project are very high. The cost of producing our prototype is expected to be under $150, which if we wanted to sell this product allow us to price it extremely competitively. Additionally, since the user can host their own model, they have complete control over the cloud costs.

Kemdi Emegwa’s Status Report for 2/15

I mainly spent this week writing the code for the query workflow. Now we have the preliminary code which uses the CMU PocketSphinx Speech to Text engine and sends this query from the device to the server hosting the model. It then receives the query result and using Espeak, it uses the microphone to output the query. I am currently on track and will be looking to make this more robust and will start working on the dockerization of the model.

David’s Status Report for 2/15

This week I ordered a replacement microphone as the one previously ordered from 18-500 inventory was too expensive and needed to be reserved for other potential groups. I then configured our raspberry Pi by downloading the OS on my laptop and plugging in the SD card to the board. I also configured the board to connect to CMU-SECURE.

Kemdi Emegwa’s Status Report for 2/8

This week I primarily worked on researching the mechanisms that we are going to use for text-to-speech/speech-to-text. Python has many established libraries for this purpose, but we have the added nuance that whatever model we use, will have to run directly on the Raspberry PI 4. I was able to find a lightweight, open-source model that was actually developed by CMU researchers called PocketSphinx. This will likely work well for our use case because the model is small enough to run locally on limited hardware. We are currently on schedule and for the upcoming week I plan to finish the python code for the Raspberry PI so we can start utilizing the speech-text on the device.

Team Status Report for 2/8

The most significant risk is that the latency of the whole system is too large. We want it to be <5 second from the person finishing their statement to our device responding. If we cannot meet this requirement, we will have to redesign the data flow or move the model location sure the latency is under control. No changes were made to the existing design of the system. Below are some pictures of our frontend UI that we setup this week that will allow users to customize LLM models (not yet functional). We also finalized our order of parts this week.

Justin Ankrom’s Status Report for 2/8

This week, I worked on getting the frontend UI setup for our project. I setup a Next.js application and worked on the home page. I got the basic UI and feel for the home page going. The home page allows users to pick between a set of our LLMs if they want a easy and convenient solution, or they can host their own LLM. The frontend has no functionality yet besides the UI. A picture is attached. Progress is on schedule for what I want/need to get done. Next week, I hope to implement authentication so we can differentiate between users. Users are important so we can update the correct settings based on the user.

David’s Status Report for 2/8

I looked through the inventory to get the three main components we need: speaker, microphone, and raspberry Pi 4. The inventory had both the raspberry Pi and the microphone which I ordered, but no speaker. I did some research on a speaker that would be able to connect to a raspberry pi 4 and ordered that as well. Our project is on schedule. I hope to be able to connect the speaker, board, and microphone next week to have basic IO.