zliao2_Status_Report – Team E1: Texcelerate

March 17, 2025

Andrew’s Status Report for March 15

After the FPGA finally arrive this week, I was able to become more productive in the team effort. In the beginning, I found out that the power supply used for the Ultra-96 isn’t compatible with the Kria and had to wait a few more days for the power supply to arrive and begin testing. I then flashed the factory image and booted linux on the FPGA. The linux boot process was troublesome, the documentation wasn’t very clear and I had to do quite some debugging to control the FPGA from UART port initially, connect the FPGA to internet and update repos(there is a certificate invalid error that took some time) and then setting up the FPGA boot Ubuntu image and building the pynq server onto the fpga. In the end the FPGA is setup as a server that can be accessed anywhere on campus using CMU WIFI with the help of pynq, which also includes an easier way to setup PS to PL connection.

I plan to do more with the FPGA next week, getting the demos to read/write the fpga between PS/PL, and setting up ssh connections for computer 2 FPGA transmissions. I will also start to look into the specific RTL modules that needed to be designed to accelerate the BitNet text autocomplete modules.

February 16, 2025

Andrew’s Status Report for Feb 15

My main focus this week was to keep looking into the FPGA operation, securing and trying to setup the FPGA.

I was able to successfully secure an Ultra-96 V2 FPGA from Quinn in the parts inventory. The setup was troublesome for two reasons: 1) my laptop was sent to repair and I was unable to flash the sd card image myself to complete FPGA setup. 2) When I was able to flash the sd card and setup the FPGA myself, I find out that the FPGA is faulty out of the box. Its wifi module, the only available way for the FPGA to communicate off-chip, was found to be broken. I later located an errata file that detailed production issues that caused malfunctioning Ultra-96 V2 units, and our FPGA was produced during the affected batch.

The plan for next week is to decide on a replacement FPGA as soon as possible. We have two choices, another Ultra-96 V2 or the Kria KV-260, the latter has more ram for better deployment models, (4GB instead of 2GB on Ultra-96 V2) but suffers from the drawback that it doesn’t have an onbard SoC and would require an instantiation of a soft-core instance, complicating the setup and requiring the use of Xilinx Vistis, a tool that no team member has worked with before at this point.

For Status Report Part C:

Deploying modern large language models (LLMs) locally often comes with significant economic challenges. Traditional solutions typically require multiple high-end graphics cards, which are expensive and require non-trivial energy costs.

Our proposed FPGA-based local inference solution addresses these economic concerns in several key ways. Reduced equipment costs make FPGAs a more affordable alternative to multi-GPU systems, lowering the upfront capital needed for local LLM deployment. Energy efficiency and long-term savings become a significant economic advantage over time. While a typical multi-GPU setup can consume around 2000W during a standard workload, an FPGA implementation operates within a 20-50W range. The drastic reduction in energy consumption translates to considerable cost savings and a more sustainable operational model. Lower maintenance overhead compared to maintaining multiple GPUs means FPGA-based systems typically experience lower failure rates and reduced thermal management requirements, minimizing both downtime and repair costs.

February 9, 2025

Andrew’s Status Report for Feb 8th

In the first half of this week I was looking into softcore IPs for CPU and GPU to be synthesized later for performance benchmark. Then after talking with Prof. Benson we decided that we would instead compare our benchmark on commercial CPUs and GPUs using non-quantized version of the models. I also worked on securing and initial setup of the FPGA. We chose and obtained from the class inventory the Ultra-96 V2 FPGA and I am currently trying to boot linux on the FPGA. In case of insufficient onboard ram, I have also obtained permission to use one of the research FPGA from Professor Ken Mai if needs arise later into the semester.

My plan for next week are:

Complete FPGA setup and look into ways to easily interface into the FPGA
Start the design of overall architecture of the FPGA for ternary llm inference
Setup tool chain on Vivado focusing on data transfer between the FPGA PS and PL

January 30, 2025

Andrew’s Status Report for Feb 1st

I am currently working on selecting proper CPU and GPU soft cores to be synthesized on to the FPGA for performance and power efficiency.

I have looked into multiple open-source RISCV IPs including the Rocket-Core (a widely known UC Berkley project based on HLS(High Level Synthesis) languages), the VexRISCV project (frequently used in 18-525/725 tape-out, proven to work in multiple real chips) and the hazard-3 core designed by Luke Wren, principle engineer of Raspberry Pi, an is currently onboard multiple RPI products. I worked with all of the projects and decide to go forth and select the VexRISC-V core as the benchmark softcore because:

It has a long history of success, the project is designed for FPGA softcore and has been verified on multiple FPGA fabrics, including ones that we might use later in the project. Unlike the Hazard3 core, which is designed to be used in silicon.
The project is simple and has lots of example to draw from, while the Berkley Rocket-Core and the Chipyard framework has a huge dependency of more than 30G in total and ended up not working out of the box.
VexRISC-V, also being an HLS project, offers great flexibility as well, and have vanilla options for multiple bus protocol options, which will facilitate communication when synthesized onto the FPGA. It also has support for directly booting linux for even greater ease of use.

Currently my progress is on schedule, the next steps are testing out the Vex soft core on 240 FPGA (we are planning on using Xilinx boards so the Vivado toolchain would match) and find and evaluate appropriate GPU soft IPs as well.