Andrew’s Status Report for Feb 15

My main focus this week was to keep looking into the FPGA operation, securing and trying to setup the FPGA.

I was able to successfully secure an Ultra-96 V2 FPGA from Quinn in the parts inventory. The setup was troublesome for two reasons: 1) my laptop was sent to repair and I was unable to flash the sd card image myself to complete FPGA setup. 2) When I was able to flash the sd card and setup the FPGA myself, I find out that the FPGA is faulty out of the box. Its wifi module, the only available way for the FPGA to communicate off-chip, was found to be broken. I later located an errata file that detailed production issues that caused malfunctioning Ultra-96 V2 units, and our FPGA was produced during the affected batch.

The plan for next week is to decide on a replacement FPGA as soon as possible. We have two choices, another Ultra-96 V2 or the Kria KV-260, the latter has more ram for better deployment models, (4GB instead of 2GB on Ultra-96 V2) but suffers from the drawback that it doesn’t have an onbard SoC and would require an instantiation of a soft-core instance, complicating the setup and requiring the use of Xilinx Vistis, a tool that no team member has worked with before at this point.

 

For Status Report Part C:

Deploying modern large language models (LLMs) locally often comes with significant economic challenges. Traditional solutions typically require multiple high-end graphics cards, which are expensive and require non-trivial energy costs. 

Our proposed FPGA-based local inference solution addresses these economic concerns in several key ways. Reduced equipment costs make FPGAs a more affordable alternative to multi-GPU systems, lowering the upfront capital needed for local LLM deployment. Energy efficiency and long-term savings become a significant economic advantage over time. While a typical multi-GPU setup can consume around 2000W during a standard workload, an FPGA implementation operates within a 20-50W range. The drastic reduction in energy consumption translates to considerable cost savings and a more sustainable operational model. Lower maintenance overhead compared to maintaining multiple GPUs means FPGA-based systems typically experience lower failure rates and reduced thermal management requirements, minimizing both downtime and repair costs.

Andrew’s Status Report for Feb 8th

In the first half of this week I was looking into softcore IPs for CPU and GPU to be synthesized later for performance benchmark. Then after talking with Prof. Benson we decided that we would instead compare our benchmark on commercial CPUs and GPUs using non-quantized version of the models. I also worked on securing and initial setup of the FPGA. We chose and obtained from the class inventory the Ultra-96 V2 FPGA and I am currently trying to boot linux on the FPGA. In case of insufficient onboard ram, I have also obtained permission to use one of the research FPGA from Professor Ken Mai if needs arise later into the semester.

My plan for next week are:

  1. Complete FPGA setup and look into ways to easily interface into the FPGA
  2. Start the design of overall architecture of the FPGA for ternary llm inference
  3. Setup tool chain on Vivado focusing on data transfer between the FPGA PS and PL

 

Andrew’s Status Report for Feb 1st

I am currently working on selecting proper CPU and GPU soft cores to be synthesized on to the FPGA for performance and power efficiency.

I have looked into multiple open-source RISCV IPs including the Rocket-Core (a widely known UC Berkley project based on HLS(High Level Synthesis) languages), the VexRISCV project (frequently used in 18-525/725 tape-out, proven to work in multiple real chips) and the hazard-3 core designed by Luke Wren, principle engineer of Raspberry Pi, an is currently onboard multiple RPI products. I worked with all of the projects and decide to go forth and select the VexRISC-V core as the benchmark softcore because:

  1. It has a long history of success, the project is designed for FPGA softcore and has been verified on multiple FPGA fabrics, including ones that we might use later in the project. Unlike the Hazard3 core, which is designed to be used in silicon.
  2. The project is simple and has lots of example to draw from, while the Berkley Rocket-Core and the Chipyard framework has a huge dependency of more than 30G in total and ended up not working out of the box.
  3. VexRISC-V, also being an HLS project, offers great flexibility as well, and have vanilla options for multiple bus protocol options, which will facilitate communication when synthesized onto the FPGA. It also has support for directly booting linux for even greater ease of use.

Currently my progress is on schedule, the next steps are testing out the Vex soft core on 240 FPGA (we are planning on using Xilinx boards so the Vivado toolchain would match) and find and evaluate appropriate GPU soft IPs as well.