Kemdi Emegwa’s status report for 3/22

This week I mainly spent finally getting the audio to work on the raspberry pi. After spending a considerable amount of time trying to get our old separate mic/speaker setup to work, we eventually decided to just transition to a 2 in 1 speakerphone. Eventhough, we were initially led to believe this would not work, I was able to spend sometime configuring the raspberry pi to recognize the speakerphone to allow for programmatic audio input/output. I was finally able to start testing the capabilities our our program. However, I had to spend quite a lot of time getting the TTS models working on the raspberry pi. This required tirelessly searching for a version of Pytorch that would work on the raspberry pi.

Since I was able to finally get the device inputting and outputting audio, I decided to start benchmarking the TTS(Text-To-Speech) and ASR(Automatic Speech Recognition models we were using. As mentioned in our previous post we switched from pocketphinx/espeak for ASR/TTS to VOSK/Coqui TTS. Vosk perfomed in line with what we wanted to allowing for almost real time speech recognition. However, Coqui TTS, was very slow. I tested a few different TTS models such as piper-tts, solero-tts, espeak, nanotts, and others. Espeak was the fastest but also the worst sounding, while piper-tts combined speed and performance. However it is still a bit to slow for out use case.

To combat, this issue we are looking to transition back to using a Raspberry pi 5, after our last Raspberry pi 5 was stolen and we were forced to use a Raspberry pi 4. I think we are definitely on track and I will spend next week working on integrating querying the LLM with the devie.

David’s Status Report for 3/22

This week I worked on the model/server code. From last week, I was initially trying to get the downloaded model from hugging face running locally. However there were many issues, dependencies, and OS problems. After trying many things, I did some more research into hosting models. I came across a software named Ollama, which allowed for easy use of a local model. I coded some python to create a server which took in requests to run through the model and then return to the endpoint.

As seen here, we can simply curl a request in which will then be parsed into the model and returned. I then tried to look into dockerization of this code. I was able to build a container and curl into its exposed port, yet the trouble I come across is that the Ollama code inside does not seem to be running. I think it stems from the fact that to run Ollama in python, you need two things (more like three things): Ollama package, Ollama App and related files, and an Ollama model. Currently, the Ollama model and App are on my pc somewhere, so when I initially tried to containerize the code the model and app were not included, only the Ollama package (which is useless by itself). I then tried pasting those folders into the folder before building the Docker image, and they were still not running. I have played around with mounting images, and other suggested solutions online, but they do not work. I am still researching into fixing it, but there are few resources that pertain to my exact situation as well as my OS(windows). We are currently on schedule.

Kemdi Emegwa’s status report for 3/15

This week I spent a lot of time testing and making changes. After extensive testing I determined that the current solutions we were using for speech to text and text to speech were not going to be sufficient for what we want to do. CMU Pocketsphinx and Espeak simply did not allow for the minimum performance necessary for out system. Thus i made the transition to Vosk for speech to text and Coqui TTS for text to speech.

I spent a lot of time configuring the environment for these two new additions as well as determining which models would be suitable for the raspberry pi. I was able to get both working and tested, which yielded significantly better performance for similar power usage.

In addition, I add the ability to add/delete songs on our frontend for out music capabilities. I also added a database to store these songs.

I am on track and going forward, I plan on testing the frontend on the raspberry pi.

David’s Status Report for 3/15

This week I have worked on setting up the docker container and the software that deals with the model. I currently have working code that can take in a curl request with username and password and if correct the request will then be sent to a model. The model currently is a dummy model that just spits out a number, but once the actual model is put in place it should be working. For getting the actual model, I have requested and downloaded a Llama model from HuggingFace. I am currently working through setting up the model as it needs certain requirements. I have also done some research into new parts for speaker/microphone and have settled on one that should fix our issues. Our project is a little behind due to the hardware not working, and we hope to fix that next week with the new part. I personally hope to accomplish getting the model set up so that the established data flow of a command coming in, authentication checking, running prompt through model, and spitting out result works as intended.

 

As can be seen in this picture, I send a curl command from terminal and the server receives it and gives back a prediction (from dummy model).

Team Status Report for 3/15

The most significant risk to our project is the issue that we are having with the Raspberry Pi and its peripherals. The microphone and speaker do not seem to be working with the Raspberry Pi. We are currently looking to see if a 2 in 1 microphone will fix the issue. If it does not, we will have to brainstorm how to fix it. No changes were made to the design except some minor tweaks like how the user will be able to go to the website of the raspberry Pi and drag their mp3 files for the music player to upload.

Justin Ankrom’s Status Report for 3/15

This week I worked on adding authentication to the device configuration website and requests made from the device to the cloud. By this I mean, the user has to enter a password that will come with the device (think something like how an internet modem comes with a default password on a sticker on the bottom) to enter the website. This password will also be part of each request made to the cloud and checked on the cloud to ensure that only our device can make requests to our cloud. This protects against unwanted users changing our device configuration or using our cloud resources. I also chose the 4 open source models we will be picking:Llama 3.1 8b params (https://huggingface.co/meta-llama/Llama-3.1-8B), Vicuna v1.5 7b params (https://huggingface.co/lmsys/vicuna-7b-v1.5), Falcon-7B-Instruct (https://huggingface.co/tiiuae/falcon-7b-instruct), Qwen2.7-7B-Instruct (https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). I chose ones around 7-8 billion parameters since I think this will give us the best performance while also being relatively small. My concern with this is that it might make the cloud hardware needed to run this model expensive, so I will further research and test to see if we can run them on cloud while not being too expensive. If not, I have already looked into 1-3b param models that we could use. I also helped David start configuring the docker containers.

I am on schedule for what I need to get done. Next week I want to do some testing to see if our 7-8b param models will not be too expensive to run on cloud. Based on those results I might need to swap the models. Last week I said that this week I wanted to fill the website out, but I decided to postpone that until we have finalized everything, so in the last 1-2 weeks of the project. This is because I don’t want to make changes and then have to override them over and over based on additional testing or changes we make. Next week I also want to start and finish the music player. This week I researched into how I will do it and next week I just need to implement it.

Overall, this week involved getting authentication setup and doing a lot of research on models and how to setup the music player.

David’s Status Report for 3/8

This last week I have worked on the board with Kemdi and also have looked into dockerization of the project, along with buying some new parts. Kemdi and I connected the peripherals to the board and first ran into some troubles with audio. The microphone was able to capture correctly but the speaker did not seem to work. We suspected it had something to do with the audio input and outputs, and we were able to figure it out although the speaker was very weak which led me to find a bigger one to purchase. We then tried to run the speech to text code on the Raspberry Pi 4, but there were a lot of errors. Although the code works fine on Kemdi’s computer, we believe that the raspberry Pi relies on certain native code which gives us errors when we tried to run it. There are two weird programs jackd and alsa. We are not sure how to continue on that front and have requested help from a TA. I then continued into researching how to create a docker container for our project as I was unfamiliar with it and our project requires it. We are still on schedule although we need to fix our board problem as soon as possible.

Kemdi Emegwa’s Status Report for 3/8

This week, my primary focus was on testing and debugging the microphone integration within our Raspberry Pi setup. I dedicated significant time to troubleshooting the microphone functionality issues encountered when executing our code. This involved meticulously reviewing error logs, verifying hardware connections, and running numerous diagnostic tests to pinpoint the problem. Additionally, I explored different software configurations and settings to identify compatibility challenges with the microphone.

In parallel to the debugging process, I worked extensively on refining our existing codebase. The goal was to enhance compatibility and ensure greater stability when running directly on the Raspberry Pi. This refinement process included optimizing performance, addressing potential memory usage concerns, and ensuring that our code efficiently interfaces with the hardware. The improvements made this week will set a solid foundation for the upcoming integration work.

Despite the encountered difficulties, our project remains on schedule. Through careful evaluation, we concluded that pivoting to a hardware setup utilizing a single sound card for both the speaker and microphone would be beneficial. This decision should simplify integration significantly and resolve the compatibility issues previously faced. Next week, I will specifically focus on testing and integrating this revised speaker/microphone configuration, which should maintain our progress and help us stay aligned with our overall project timeline.

Justin Ankrom’s Status Report for 3/8

This week, I setup the website that will live on the physical hardware as well the website we host. Originally, we had planned to have only one website which we hosted, but decided to go a different route. The reason for this was because if we hosted everything, we would’ve needed some form of authentication so we would have had to stored user information which goes against our security policies. So I had to come up with a new approach. I came up with using 2 websites instead: 1 that lives on the hardware and another that we host. The one that lives on the hardware will be where people can set their VM configuration (VM ip address) and also look at the music they have saved, and the website we host will be strictly for setup instructions and for data privacy terms of service. This means that what we host is purely static and applies to every single user, while user configuration lies on the client side. This change means that I had to scrap almost all of the existing website code and restart. I spent a lot of time coming up with this 2 website approach and thinking about how I wanted to do it. For our hosted website, I am still using React and Next.js and hosting it on Vercel. For the client website, I decided to serve a simple HTML page on a Flask app. This is because on the client side, we have very limited resources, so I decided to go with a very lightweight approach. I have initial websites that work for both client side and our side. A lot of it is filled with filler content but this is fine as it will be quick to update the actual content since the overall layout is established.

Based on this, I think my progress is on schedule given the recent pivots we discussed in the team status report. Next week I want to work on filling out the actual content of our website, meaning setting up VM setup instructions, and want to setup a docker container for at least one open source model so we can get ahead with that. I also want to pick the exact open source models we want to use so we have a finalized list of those models.

Here are some pictures of the websites.

Hero section of website
VM setup instructions
Available models
Privacy section
Client side configuration website