Andrew’s Status Report for Feb 15

My main focus this week was to keep looking into the FPGA operation, securing and trying to setup the FPGA.

I was able to successfully secure an Ultra-96 V2 FPGA from Quinn in the parts inventory. The setup was troublesome for two reasons: 1) my laptop was sent to repair and I was unable to flash the sd card image myself to complete FPGA setup. 2) When I was able to flash the sd card and setup the FPGA myself, I find out that the FPGA is faulty out of the box. Its wifi module, the only available way for the FPGA to communicate off-chip, was found to be broken. I later located an errata file that detailed production issues that caused malfunctioning Ultra-96 V2 units, and our FPGA was produced during the affected batch.

The plan for next week is to decide on a replacement FPGA as soon as possible. We have two choices, another Ultra-96 V2 or the Kria KV-260, the latter has more ram for better deployment models, (4GB instead of 2GB on Ultra-96 V2) but suffers from the drawback that it doesn’t have an onbard SoC and would require an instantiation of a soft-core instance, complicating the setup and requiring the use of Xilinx Vistis, a tool that no team member has worked with before at this point.

For Status Report Part C:

Deploying modern large language models (LLMs) locally often comes with significant economic challenges. Traditional solutions typically require multiple high-end graphics cards, which are expensive and require non-trivial energy costs.

Our proposed FPGA-based local inference solution addresses these economic concerns in several key ways. Reduced equipment costs make FPGAs a more affordable alternative to multi-GPU systems, lowering the upfront capital needed for local LLM deployment. Energy efficiency and long-term savings become a significant economic advantage over time. While a typical multi-GPU setup can consume around 2000W during a standard workload, an FPGA implementation operates within a 20-50W range. The drastic reduction in energy consumption translates to considerable cost savings and a more sustainable operational model. Lower maintenance overhead compared to maintaining multiple GPUs means FPGA-based systems typically experience lower failure rates and reduced thermal management requirements, minimizing both downtime and repair costs.

Leave a Reply Cancel reply