Amelia’s Status Report for March 8

This past week I did not do anything since it was spring break. The week before last I focused on getting the design review done before the Friday deadline. I did not spend as much time getting comfortable with the Vitis EDA tools as I had planned, however, I’m sure over the course of the next few weeks, having hands on exposure will be enough for me to figure things out. I therefore accomplished one out of my two goals from two weeks ago.  I think I am still on schedule.

Given I have quite a busy upcoming week, I plan to focus on figuring out how to pull power data from our new FPGA and also look into multiuser authentication as an extension to our UI. I expect these to be manageable goals for the upcoming week. I would also like to spend some time organizing our website.

anirudhp_status_report_03/08/25

This week, I focussed on two primary aspects of the project:

  1. Ethical considerations and how this will adjust the benchmark. For this system, I have made some minor improvements to the model so that it simply refuses to autocomplete certain types of text — eg: Medical, urgent action etc.
  2. Analysing the Microsoft Bitnet Paper in order to suggest performance improvements that we could target.

Overall, the aspects that I was able to achieve are:

  1. Reduced hallucination rate by over 6%, but this was naturally at the expense of the model simply refusing to provide an output.
  2. Identified the Look-up-table implementation and the indexing system as the major speedups which would provide 40% more throughput in the system.

My goals for next week are:

  1. I want to be able to connect with the FPGA wirelessly and transmit the query onto the board(this I can do simply after booting Linux on the core that the board has) so I’d probably do this before we start working on the synthesis flow.
  2. Prepare more on Vitis to see how I would synthesize a basic block that detects the query and pasts the exact same text as the query into the output(this can be seen as a prelim step, we would simply replace the short circuit with our model in order to complete the system)

I wanted to keep pretty conservative goals for this week given that we are finally going to start interfacing with hardware, and this will always come with a number of challenges relating to the setup and the use of the system. At the same time, I still think that the above goals that I have listed are reasonable.

Team Status Report 03/08/2025

This week we had a couple of targets that were mostly achieved:

  1. We noticed that our setup script had some issues and was a bit unstable to run. But it was still reasonably fast, so we didn’t have any problems running it several times until it worked completely. So we wrapped the script in a loop with a try-except block so that it kept running until the full setup worked. We would have liked to have debugged the code but there are not that many gains that we could have made by doing this, and preferred to focus on the hardware segment.
  2. We analyzed the bitnet paper that Microsoft had published and came up with an overall block diagram that would accelerate the system, and did some preliminary calculations on how much of a speedup we would be able to attain over the classical form of the core that we were using. From the looks of it we would be able to save a number of cycles and shrink the overall size of the arithmetic blocks to meet the speed specificaitons that we had.
  3. We analyzed the ethical impacts of our project and completed the design review report.

 

Over the next week, our aims are:

  1. We should get the final FPGA and then synthesize our base core and model onto it. Using this we want to benchmark the following
    1. Total size footprint — See if we can fit a bigger model in.
    2. Tokens/Sec and latency to first token — This gives us an idea of how much of a speedup we would need over the existing hardware system. We would probably need to adjust the block diagram to meet this value.
    3. Power telemetry — This is a new FPGA so we would need to get an idea of how to pull power data from the new FPGA.
  2. Also, we would like to extend the UI script to interface with the FPGA and start thinking about the authentication and scheduling systems for multi-access. Mainly to see whether it is in fact feasible, not to see how we would implement it.

Team Status Report for Feb 22

This week we focused on a few different things – the design presentation, which was mainly covered by Andrew,  UI development, which was covered by Anirudh and I (Amelia), and FPGA drama, which we all dealt with.

We finally got the UI setup scripts to work on other user’s computers – something we’d been working on. Once the set up process was finalized, we worked on a more usable interface that fits our use case requirements as laid out in our design presentation. We want to have the UI set up and run scripts completed soon because we plan to do some preliminary user testing while there is still plenty of time to make changes based on anything we learn.

The FPGA drama of the week was we found out that the Ultra96-V2 board we got is part of a broken batch that has non-functional wifi. Without the wifi, we can’t ssh into the board, so we had two options, either pick a new board, or obtain a wired uart extension board. We decided to try the Kria KV260, since that was another board we initially considered due to its greater memory capacity. Our initial hesitation with using this board was because none of us were familiar with Vitis, but after a little reading, we felt that we can learn how to use those EDA tools well enough to develop a synthesis flow. Now we may also be able to work with a larger model, which will give us more accurate text completions.

Group Goals for the next week:

  1. We have to complete the design review report for Friday
  2. Our new FPGA is arriving soon, so we need to start working on our new synthesis flows.

Amelia’s Status Report for Feb 22

This week I focused on getting the UI ready to be tested by users. I learnt the basics of LUA and edited Anirudh’s original scripts to support a text preview and then the user can either accept the auto complete or regenerate the response. The UI now utilizes two hotkeys – cmd G to prompt the model and cmd L to remove the output if it isn’t what the user wants. I wanted to get the UI finished early so that we can start preliminary user testing (likely after spring break).

The other main thing I was focused on this week was watching/reading some Vitis tutorials to learn how to synthesize softcores on our new Kria FPGA.

Preview of generated text

Goals for next week:

  1. make significant progress on the design review report as that is due Friday
  2. Spend more time getting comfortable with Vitis EDA tools

anirudhp_status_report_Feb22nd

This week, the focus was on packaging and clearing our user interface system in order to start getting feedback form people that can trial our system.

This week I managed to get the power and timing parameters achieved to be printed on the side of the screen in a location that I thought was unobtrusive. It’s one of the details that I would like to verify in our user feedback form.

Additionally, given that we are moving from an Ultra96v2 FPGA to a Kria based FPGA, we would need to learn how to use a different set of EDA tools. So I spent the past week mainly focussing on how to operate and use Vitis to synthesize our softcores and language models.

Over the next week my goals are:

  1. I want to work out how the full synthesis flow to load our models onto the Kria works.
  2. I want to see how to move data onto the FPGA and pull the results out of the FPGA works, this way I can extend our previous python script to use the FPGA for inference.

After this, I would plan to go for the more advanced power and performance data that we want to monitor on the FPGA.

 

We’re currently well ahead of schedule and on track to reach the iteration and architecture phase within another 2 weeks.

Andrew’s Status Report for Feb 15

My main focus this week was to keep looking into the FPGA operation, securing and trying to setup the FPGA.

I was able to successfully secure an Ultra-96 V2 FPGA from Quinn in the parts inventory. The setup was troublesome for two reasons: 1) my laptop was sent to repair and I was unable to flash the sd card image myself to complete FPGA setup. 2) When I was able to flash the sd card and setup the FPGA myself, I find out that the FPGA is faulty out of the box. Its wifi module, the only available way for the FPGA to communicate off-chip, was found to be broken. I later located an errata file that detailed production issues that caused malfunctioning Ultra-96 V2 units, and our FPGA was produced during the affected batch.

The plan for next week is to decide on a replacement FPGA as soon as possible. We have two choices, another Ultra-96 V2 or the Kria KV-260, the latter has more ram for better deployment models, (4GB instead of 2GB on Ultra-96 V2) but suffers from the drawback that it doesn’t have an onbard SoC and would require an instantiation of a soft-core instance, complicating the setup and requiring the use of Xilinx Vistis, a tool that no team member has worked with before at this point.

 

For Status Report Part C:

Deploying modern large language models (LLMs) locally often comes with significant economic challenges. Traditional solutions typically require multiple high-end graphics cards, which are expensive and require non-trivial energy costs. 

Our proposed FPGA-based local inference solution addresses these economic concerns in several key ways. Reduced equipment costs make FPGAs a more affordable alternative to multi-GPU systems, lowering the upfront capital needed for local LLM deployment. Energy efficiency and long-term savings become a significant economic advantage over time. While a typical multi-GPU setup can consume around 2000W during a standard workload, an FPGA implementation operates within a 20-50W range. The drastic reduction in energy consumption translates to considerable cost savings and a more sustainable operational model. Lower maintenance overhead compared to maintaining multiple GPUs means FPGA-based systems typically experience lower failure rates and reduced thermal management requirements, minimizing both downtime and repair costs.

Amelia’s Status Report for Feb 15

I started off this week setting up the DeepSeek model I downloaded last week. While very interesting, after using the model and trying to find a way to prompt it to do text completion, I found that it is very much a chatbot with visible chain of thought. Therefore, we decided to stick with our original model.

I then pivoted to getting our UI set up on my computer, which required some debugging. I wanted to make sure the set up scripts were working because I plan to conduct some preliminary user testing next week. I quantified our metrics for usability, deciding on a setup time of at most 15 minutes and a benchmark of 100% success for users to reject or accept text completion suggestions, which is one of our system requirements. Finally, I worked on the design review presentation, making final block diagrams and quantifying technical design requirements.

The specified need our product will meet with consider of social factors is that it levels the playing field in terms of who has access to AI tools that support more efficient workflows for individuals. While many people can use text completion copilots to speed up their work, those working in data sensitive field cannot. This leads to a disparity across groups of who has more time to spend on more skilled aspects of their job, or more free time to spend doing other things. By enabling greater access to copilot tools, our product effectively enables greater access to better uses of time for a greater number of people.

Anirudhp status update Feb 15th

This week, I focussed on setting up power and timing infrastructure on my Mac and integrating this into the overall system.

I managed to achieve all of those goals, and evaluate it on a couple of test prompts. This seems to yield some encouraging results:

  1. Mean power dissipation:
    1. CPU — 600–700mW
    2. GPU — 24-40mW
  2. Mean timing:
    1. 1.1 — 1.3 seconds

Which seem to indicate that the FPGA system will effortlessly beat these specifications, so it looks like we’re on the right track in that regard.

A more important aspect now is to be quite thorough in this system, so while the FPGA setup is ongoing I plan to find a dataset to benchmark the power and timing on to find the average performance. I also evaluated the model on truthfulQA and found a score of 30 which is a pretty decent score for a model of this size.

For the next week, I aim to complete the above goals and also extend my python script for WiFi connectivity to the FPGA.

Answering part A:
Most people’s data whenever they wish to leverage large language models or any other AI based systems, get sent into data centres. These data centres process and compute the results. This leads to vulnerability on two ends:

  1. The data may be intercepted and read while in transit.
  2. Without control over the data, you never know what is being done with it after it has been used.
    Which leads to poorer intellectual property protection and personal data safety.

    Additionally, as people become more and more reliant on these systems they will start using it for more critical tasks — like urgent healthcare etc. As a result, in the absence of wireless connectivity, this can cause significant harm.

Our solution aims to provide a fully offline setup for distilled AI systems in order to provide reliable, secure, and offline AI inference to people that want to keep control of their data.

Team Status Update Feb 15th

This week we focussed on wrapping up the auxiliary tasks that lead up to the final stage of the project where we’ll focus on iterating on the hardware accelerator. Namely:

  1. Setting up our benchmarking and profiling system for the baseline.
  2. Setting up the FPGA connectivity and synthesis flow.
  3. Evaluating a Chain of thought alternative model for improving model accuracy.

We managed to complete the benchmarking and profiling system, and eventually decided against using deepseek-r1’s smaller variant, however the FPGA system did not end up working as we expected.

We found that the the FPGA system that we used had some flaws in its WiFi connectivity setup, this leads to us not being able to service multiple clients at the same time.

Our goals for next week are:

  1. Run our benchmarking and profiling system on a wide spectrum of input tokens, and collect a comprehensive characterization dataset on our Macs.
  2. Swap to a functional FPGA with WiFi capacity and boot linux as well as our synthesis flow on the board. However, in the duration that the other FPGA takes to arrive, we can still try to synthesize the model onto ours and get that working — but this will only be a viable option if we’re going to use the same FPGA type after changing.
  3. Preemptively prepare the interconnect from FPGA to laptop and begin drawing a block diagram for the accelerated system.

For status report 2: A was written by Anirudh, B was written by Amelia, and C was written by Andrew.